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A CONTRIBUTION TO THE THEORY OF SELF-RENEWING 
AGGREGATES, WITH SPECIAL REFERENCE TO 
INDUSTRIAL REPLACEMENT 

By Alfred J. Lots. a 

1. Introduction. The analysis of problems of industrial replacement forms 
part of the more general analysis of problems presented by “self-renewing 
aggregates.”^ While the subject could, therefore, be treated in general and 
consequently rather abstract terms, for the purpose of exposition it will be 
advantageous' oo relate the discussion to concrete applications. These, in the 
past, have been mainly of two kinds, namely, first, applications to population 
analysis with related problems in genetics on the one hand and actuarial prob- 
lems 0 i the other; and second, applications to industrial replacement. As the 
fundamental setting of the two types of problems is very similar, leading in 
each case to certain integral equations, it will be advantageous to consider 
together both problems, or both phases of the general problem. This will 
incidentally give us an opportunity to observe the analogy, but also certain 
points of difference, between the two aspects of the problem. 

Historically, the investigation of an actuarial problem came first. L. Her- 
belot** (1909) examined the number of annual accessions required to maintain a 
body of N policyholders constant, as members drop out by death. He assumes 
an initial body of N “charter” members at time t - 0, all of the same age, which 
for simplicity may be called age zero, since this merely amounts to fixing an 
arbitrary origin of the age scale. He further assumes the same uniform age at 
entry for each “new” member. 

Then, if p(<) is the probability at the age of entry of surviving t years, the 
survivors of charter members at time t will number iVp(0; and if /(r) is the 
rate per head at which members drop out by death at time r, being then imme- 
diately replaced by a new member of the fixed age of entry, then the survivors 
at time t of “new” members will evidently be given by 

N jf f{r)p{t - t) dr 

‘ I use here an English equivalent, as nearly as possible, to the German phrase “sich 
erneuernde Gesamtheiten,” used by Swiss actuaries. 

“ Herbelot’s original paper is disfigured with a numbei of misprints It is essentially 
reproduced, with the errors corrected, in a paper by R Risser (1912), The same treatment 
of the problem is also given by Zwinggi (1931) and by Schiilthess (1935), (1937). 

1 



2 


ALPEBD J. LOTKA 


Hence, the condition for a constant membership N is 

(1) Npit) + ^ fir)p(t -T)dT = N 
or 

(2) pit) + f(r)pit — t) dr = 1 

Differentiating with regard to t, and remembering that p(0) = 1, we have 

(3) ■ p'it) + f‘fir)p'(t - r) dr + fit) - 0 

Equation (3) may be written 

(4) fit) = -p'it) - fir)p'it - r) dr 
or, putting it — r) = a 

(5) fit) = - p'it) - 

For the solution of the integral equation thus obtained Herbelot uses the 
method of successive differentiations,’ duly pointing out its limitations, and 
applying it to several specific expressions for the survival function p(o). 

There is nothing in Herbelot's treatment to limit its application to living 
organisms. It is directly applicable to the problem of industrial replacement 
of an equipment comprising JV original units installed at time t = 0, and main- 
tained constant by the replacement of disused units with new. 

Next in chronological order, of publications dealing with the type of problem 
with which we are here concerned, is a paper by Sharpe and Lotka (1911), who 
use Hertz’s form of solution for the integral equation involved.’ To this I wish 


® This method is also followed in dealing with the problem of renewal by Risser (1912), 
(1920); Zwinggi (1931); Sohulthess (1935), (1937); Preinreioh (1938). All these authors 
applied their reflections to arbitrarily assumed frequency distributions for the renewal 
function, of simple analytical form. For example, among the more recent applications is 


one by Schulthess, who uses the function p(t) ■ 




; and quite recently, Preinreioh 


has suggested the use of a Type I Pearson frequency curve on the basis of Kurtz’s observa- 
tional data. It is to be noted, however, that when it comes to actual application, Prein- 
reioh does not use an ordinary Pearson Type I curve nor actual observational data of any 
kind, but very conveniently simplifies the Pearson formula by giving integral values, 
namely 1 and 2, to the exponents, thereby reducing to triviality the task of applying the 
method of differentiation. None of these authors makes any attempt to deal with actual 
numerical observations which, in practice, fall far wide of any of the simple analytical 
formulae employed by them. 

‘ P. Hertz, Mathematische Annalen, 1908, vol. 65, pp. 84 to 86. 
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to refer in some detail, adding to the original exposition in the light of later 
developments. The treatment of the subject proceeds here along somewhat 
broader lines, but, with obvious changes in the meaning of the symbols, and 
with certain modifications and limitations which are themselves of interest, 
the development is immediately applicable to economic systems composed of 
units having a characteristic “mortality” in use. 

A population of living organisms, unlike industrial equipment, has practically 
no beginning. We know its existence only as a continuing process. Accord- 
ingly the equation for its development is most naturally framed Without explicit 
reference to any “charter members.” 

The basis of the analysis is as follows: 

In a population growing solely by excess of births over deaths (i.e. in the 
absence of immigration and emigration), the annual female births B(t) at time t 
are the daughters of mothers a years old, born at time (t — a) when the annual 
female births were B(t — a). If fertility and mortality are constant and such 
that a fraction p(a) of all births survive to age a, and are then reproducing at 
an average rate m{a) daughters per head per annum, then, evidently,^ 

(6) B(i) = I Bit — a)pia)mia) da 

Jo 

(7) = J Bit — o)^(a) da. 

This is the fundamental equation in its original form, and, as noted above, 
it does not explicitly refer to any initial state, though, as will be seen presently, 
in order to make the problem determinate, data regarding the system at some 
particular period must be given. For the present we note that (7) can be 


written 



(8) 

Bit) = Bit - 

a)(pia) da + J Bit — a);p(a) da 

(9) 

Bit) = B,it) + 1 

Bit — a)<pia) da. 


10 


It is to be noted that the right hand member of (8), splits the total births £(<) 
into two sections, those in which (t — a) < 0, that is, births of daughters whose 
mothers were born before t = 0; and those for which (i — a) > 0, that is births 
of daughters whose mothers were both after t = 0. The former section is 
denoted by Biit) in (9). The function jBi(i) thus defined will be found, in the 

‘ Here and elsewhere in these developments the limits of the integral have, for simplicity, 
been written 0 and « . This ensures the inclusion of all nonvanishing terms in the inte- 
grand; the inclusion of terms for which either ^>(0) or B(t — a) vanishes does not, of course, 
afEect the value of the integral. If *5(0) is represented between the limits a, 01 of the repro- 
ductive period by some analytical expression, such as a Pearson frequency function, it is, of 
course, understood that outside the range a, w we must put ip{a) = 0 
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further development, to play a significant r61e. Here it will suffice to point out 
that it vanishes for all values of t greater than w, the upper limit of the repro- 
ductive period, because (pia) vanishes for these values of a. 


2. Special case. A case of special interest is that in which Biit) represents 
the births of daughters whose mothers were all born in an interval of time 
t = -dttot = 0. In that case the first integral in (8) reduces to a single term, 
so that 

(10) B{t) = B{0)<pit)dt + Bit - a)<pia) da 
or, putting 

(11) B(Q) dl = fVo 

(12) Bit) = iVo<p(<) + J Bit - a)ipia) da. 

This last equation holds also if a finite number of births take place (or are 
regarded as taking place) at a point of time i = 0. 

Equations (10) and (12) are of interest as basic for the examination of the 
progeny of an infinitesimal population element, “ that is, of a "zero" generation, 
born at time zero. In that case Biii) is the annual rate of births in the "first” 
generation, and is simply proportional to <pit), i.e. 

(13) Biit) = No<pit) 


For the sake of greater generality the development has so far been given in 
terms of the phenomenon of replacement (reproduction) as it presents itself 
in a population of living organisms. But it should be noted here that, with 
appropriate changes in the meaning of via), equation (12) is directly applicable 
to the problem of industrial renewal in an installation originally installed at 
some point of time and maintained at a constant level by the replacement of 
each unit by a new one, the moment it is disused. In that case the "rate per 
head of reproduction” m(a) at age a is evidently the same thing as the “death 
rate per head” at age o, namely 


(14) 

so that 


fiia) ~ 


dpja) _ p'ja) 
pia) da pia) 


(15) <pia) = p(o)M(ft) 

becomes 


(16) 


via) = -p'ia). 


‘ A. J. Lotka, (1928), (1929). 
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Reverting now to the fundamental equation in its first form (6), a trial 
substitution 


(17) 


Bit) = 


is found to satisfy this equation, provided that r is a root of the characteristic 
equation 


(18) 



We may speak of (17) as a particular solution of (6) or (7). It is easily seen 
that the sum of such particular solutions is also a solution, i.e. 

(19) B(i) = + • • • 

where n , ra etc., are roots of the characteristic equation (18).’^ 

For real values of r the function 


( 20 ) 


«.) - 1 ; 


e ^ if>(a) da 


decreases monotonically as r increases, since, from its nature, (/>{a) > 0 for all 
values of a. Hence (18) can have only one real root n , and we shall have 

(21) according as j (p(o)do^l. 

liu + iv is a complex root of (18) then 

^«0 

e“"* cos va <p{a) da 

9 

6”““ sin va <p{a) da 


(22) 

1 = 

(23) 

0 = 


and it is evident from (22) that u < n , since cos {va) g 1 for all values of a. 
The real part of any complex root of (18) is, therefore, algebraically less than 
the real root n . 

This reasoning* is evidently quite independent of the particular form of <p{a), 
and is thus equally true, whether <p{a) be given in purely empirical form (defined 
by a table of values), or as a standard form of frequency curve, such as for 
example a Pearson curve of suitable type. 

The roots of (18) can be determined directly, though rather laboriously, from 


^ For a discussion of the convergence of the series (19) see G. Herglotz, Mathem. Annalen, 
1908, vol. 65, pp, 87 et seq. 

* Adapted from P. Hertz, Math. Annalen, 1908, vol. 65, pp. 1-86; G, Herglotz, ibid. pp. 
87-100. The Hertz solution is also applied to a similar problem by J. B. S. Haldane, Proc. 
Cambridge Phil. Soc., 1926, vol. 23, p. 607. A particularly detailed development is given by 
H. T. J. Norton, Proc. London Math. Soc., 1926, vol 28, p. 21. 
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equations (22) and (23); or, they can be brought into relation with the Thiele 
semivariants n of the function ip{a) defined by 

(24) F(r) = r e-'^v>(a)da = • 

Jo 

where mn is the Jj-th moment of (p(a) and the seminvariants /i can be computed 
from the moments by the algorithm 

mi — uma 

nii = AtiWii + MaWio 

Wla = fiiiTh "h iiiiTfli -t" /isWio 

Mi = iiirrii + d/tjmj + Smmi + inma 

etc. 

In terms of these seminvariants the characteristic equation (18) becomes 

/ 

(26) fiiT - ni~^+ ■ • • — log, mo = log. 1 = 27rm 

where n takes on all positive and negative integral values. Separating the real 
and imaginary parts in (26), and retaining seminvariants up to the fourth, 



(27) 


^(u, t>) = g (w‘ - 6w%‘ + t;*) - gw(w‘* - 3i;‘) 

+ ^ ('W* — — miM + log, mo = 0 


(28) x(W) v) = - t'*) + ^ *’(*'* “ + HiUv — inv = 27rn. 

If (p{a) does not differ too widely from the normal (Gaussian) distribution, so 
that seminvariants of higher than second order can be neglected for roots in the 
neighborhood of w = 0, « = 0, we shall have, approximately’ 

(29) ^ (w’ - t>’) - juiw + log. mo = 0 



' The relations which follow hold exactly if p(a) is actually a normal curve. It should be 
noted, however, that this can not be strictly the ease, since the infinite tail of the curve on 
the negative side would imply replacement or reproduction antedating the original installa- 
tion or aero generation Nevertheless, a normal frequency curve will be admissible if the 
part of the curve extending into the negative age field is negligible. For a concrete example 
(electric light bulbs) see E. J, Gumbel, “Die Verteilung der Gestorbenen um das Normal- 
alter," Aktuarske VeAy (Praze), 1933, p. 90. 
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or, putting 


(31) 


\ M2/ 

we have 


(32) 


(33) 

rr 2im 

Uv = 

tt„ 


U 


2 log, mo 

M2 


M2 


It is thus seen that in these circumstances the roots u, v correspond to the 


points of intersection of the hyperbola (32) centered &i u ~ ^ , v = 0, with 

M2 

a family of hyperbolas (33) concentric with (32), but with their axes at 46° to 
those of (32). 

The intersections of the hyperbolas (38) with the axis of v are given by 
putting u = 0 in (30), namely 


(30a) 


_ 2irn 
Ml 


This also gives, approximately, the frequency of the oscillatory components for 
which u is sufficiently small. In particular, for the first component, we have, 
in that case 

(30b) V = ~ 

Ml 


so that its wave length is (approximately) , the mean of the <p{a) curve. 

These facts are illustrated in Fig. 1, drawn to scale according to the vital 
statistics of the United States, 1920, for which the requisite computations were 
available from prior publications (Lotka, (1928), (1929)). The diagram is 
drawn in full, showing four intersections of each hyperbola of the family (33). 
Actually values of v occur in pairs, corresponding to conjugate roots u ± iv. 
The intersections in the two upper quadrants must be disregarded, as they do not 
correspond to roots of (18). 

To simplify notation let us write (32), (33) in the form 
(32a) 17* - = X 

(33a) Uv = C. 

Solving for f/*, a* we find 

(34) U* = i{X ± VX* + 4C*} 

(35) a* = i{-X ± VX* + 4Cn 
from which, incidentally, it is seen that 

(36) U* + a* = VX* + 40^ 
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Fig. 1 . Roots of Funbamentai. Equation ( 18 ) as Intebsections of Cubvb ( 32 ) 
WITH Family op Curves ( 33 ) 

and hence, that the intersections of the hyperbola (32) with (33) lie on circles 
of radius 


(37) 


R = + 4C“. 
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When the third and fourth moments (and therefore third and fourth semin- 
variants) are taken into account^ the hyperbolas become distorted into new 
curves, though the general topographic features of the diagram tend to be 
preserved. In particular, the property of orthogonality of intersection of the 
curves (32) with (33) is preserved, in accordance with a well-known property of 
conjugate functions.’^’' This is shown in the left hand panel of Fig 2, drawn 
for the same data as Fig. 1, but including not only the hyperbolic curves, but 
also the corresponding modified curves obtained by retaining the third and 
fourth seminvariants in the computation.*^ Only the quadrant relevant to the 
location of the roots is shown. 

3. The coefficients Q in the solution (19). These are determined by initial 
conditions, being, in fact related to the function As their determination 

in the original paper by Hertz and Herglotz is rather complicated, the following 
relatively simple method, resembling that by which the constants in a Fourier 
series are determined, is of interest: 

Multiplying equation (9) by e”'"'*, where r. is a root of (18), transposing 
terms, and integrating between the hmits 0 and to, where w is the highest age for 
which ^(a) has a value other than zero, we have 

(38) jj dt = jT" e-’-*‘|s(«) - 

Introducing the solution (19) in the right hand member of (38), we obtain 

(39) y“ e-^'‘B,{t) dt='EQi IJ 

(40) (i = 1,2, 3, ...). 
Consider now a particular term P„ in the sum 23. Multiplying out the expo- 
nentials we obtain 

(41) P„ = Q, IJ e-’-'V(a) do | dt 


which, in view of the characteristic equation (18) reduces to 



Hence, ii i ^ j 


(44) Py = r - 1] da 

~ Jo 

Which is as far as curve fitting by Pearson’s method goes. 

“ See, for example, W. E. Byeriy, Integral Calculus, 1888, p. 289. 

For a given value of u equation (27) is a biquadratic in v, and equation (28) is a cubic 
in V lacking the second degree term The computation of the curves is in consequence 
relatively simple. 


B(t — a)tfl(a)da>dt 



ROOTS OF FUNDAMENTAL EQUATION (18] AS INTERSECTIONS OF CURVE (27] OR (32] 
WITH FAMILY OF CURVES (28) OR (33) RESPECTIVELY 

POPULATION GROWTH * INDUSTRIAL REPLACEMENT * 
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Computed on basis of //r^/Zivo seminvanants, equation of hyperbolas 132J, 133J 
Data from Kurtz, E B,"Life Expectancy of Physical Property* 1930, pa^ 104, fi^ 50. 

Data from LotKa, A J,’The Progeny CTa Population Element* American Journal of Hygiene, 1928, pa^ 875 
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. A/r 

n- - n Uo 


i: 


e <p{a) da 


(46) = < I e-''\(a) da 

(46) = 0 

since u and tj are both roots of (18). But Hi — j, then (44) is of the indeter- 
minate form 0/0 and we must refer back to equation (43), from which, with 
i = we obtain, instead of (44) a different expression, namely 


dida 


so that the only term in the sum 2 in- equation (40) that does not vanish is the 
term P„ and finally 

Pi. 


(47) 

Pii — Q. 


(48) 

= Q. 

1 ae~’'‘“ (fiia) da 


(49) 


(50) 


(61) 


Q. 


I 

i: 


ae ’'‘‘‘^(a)da 


,-r,l 


Blit) dt 


r 


ae ''“(fiia) da 



[affl - J 

r* ^ 

I Bit — a)ipia) da. 

0 J 

1 dt 

J 

r ~ 

' ae 

0 

■’’’“^(a) da 



or, finally, in view of (20) 

J e"’’'* i^Bit) — J Bit — a)<pia) da 


(52) 


Q.= 


dt 


- {F'ir)], 


The coefficients Q are thus fully determined by (60) or its equivalents (61) 
or (52), when initial conditions are given, that is, when the function Pi(0 is 
given for 0 < t < CO or, what amoxmts to the same thing, when Bit) is known 
for this range of values of t.^ For complex roots the denominator in (52) 
becomes,” in view of (27), (28)’ 


(S3) 


dFjr) 

dr 


^ + m = a-iH 

du du\ 


“ Since r, is a root of ii’(r) = 1, we have 


' dF(r) 
dr J 


dFjr) ' r d logg i^(r) 1 


F{r) drj, 


r-r. 


dr 
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where G and H can be expressed in terms of the seminvariants by partial 
differentiation of (27), (28) with regard to u, namely 

(64) G = Ml - + I (« - f') - («' “ 3«t;^) + . . . 

(66) H = {Zu\ - «’) . 

In the special case that the "zero generation” is composed of JNTo individuals 
(or "units”) all born (or “entering”) at time zero, the coefficients Q are corre- 
spondingly simplified in form. For the term in the real root r we have 

(56) Q = _ • 

Conjugate complex root terms unite in pairs, giving 

(57) Q'e (G cos vi - H sin vi} . 

Unless If (a] is a normal distribution, the computation of the roots, w, v, and 
the coefficients G, H, in terms of seminvariants becomes impracticable for higher 
order roots, which then have to be computed directly and laborously from equa- 
tions (22), (23). In practice components of very high order will hardly be 
needed, nor will their use bo warranted, since the high order seminvariants, 
which are then involved, are not usually known with sufficient accuracy. An 
exception occurs when the (f{a) curve is essentially of the nature of a composite 
curve. This is what actually happened in the case of the curve of reproduction 
for a human population. For details on this point the reader must be referred 
to my paper “The Progeny of a Population Element”. 

4. Alternative Representation of the Function B{i), By the application of 
the Hertz-Herglotz solution of the integral equation (6), the evolution of a 
population or aggregate is represented as the resultant of a series of damped 
oscillations. 

Additional insight into the nature of the renewal process is gained by viewing 
the total renewals as composed of contributions from successive "generations”. 

n For details see A. J. Lotka, The Progeny of a Population Element, p. 892. 

“ In the case of a population the term “generation” calls for no explanation: mother, 
daughter and granddaughter, for example, represent three generations; in the case of 
industrial replacement, the term is to be understood in this sense, that the original installa- 
tion constitutes the original or zero generation, the units introduced to replace disused 
units of the zero generation constitute the “first” generation, renewal of these the second, 
and so on. 

This explanation may seem unnecessary. However, from some correspondence received 
by the writer it seems that perhaps some readers have confused the generations thus defined 
with successive “cycles” of duration equal to the extreme “length of life” of the units. 
With such “cycles” we are not here concerned. 
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This leads to an alternative representation, in which the evolution of the 
aggregate appears as the sum of a series of frequency curves, each corresponding 
to the contribution of one generation to the total births or replacements at 
time 

In order to realize this second representation we note, first of all, equation (7) 
applies not only to the total births at time t, but, with slight modification, also 
to the births in any particular generation. Here it will be convenient to con- 
sider the special case of a zero generation of iVo individuals (or units) all born 
(or installed) at time t = 0. 

The births (or renewals) in the “first” generation, that is offspring of the 
zero generation, or renewals of disused units of the zero generation, will be 
distributed in time according to the equation 

(58) Bi{t) = Nav{t). 

For the second generation, or renewals of disused units of the first generation, 
we shall have 

(59) Bi{t) = j Bi{t — o)^(a) da 

This alternative approach of the problem bears some superficial resemblance to a 
method followed by R. Frisch in his article “Sammenhengen melleiti primaerinvesteringen 
og reinvestering” (Slalsekonomisk Tidskrift, 1927, p. 117) Frisch also follows up the 
distribution in time of first, second, and higher order replacements, and gives diagrams 
bearing a superficial resemblance to Fig. 4 in the present text. But Frisch’s development 
has otherwise little in common with that here presented. He deals with equipment com- 
posed of various units, with expectation of life varying discontinuously or continuously 
from one unit to another, but fixed at a single value for a given unit. To use one of his own 
examples, it is as if a wooden hammer with a life of one year were always replaced by 
another wooden hammer, also with a life of one year, and so on: while a steel hammer, 
with a life of three years, were always replaced by another steel hammer, also with a life of 
exactly three years. The analogous case in population analysis would be presented by a 
population in which length of life were strictly hereditary, so that a man dying at age 50 
would have a son, grandson, etc., each dying at age 50. In the field of industrial replace- 
ment and in population analysis alike this is a highly unrealistic supposition. 

Needless to say, with these basic assumptions, Frisch’s resulting equations differ funda- 
mentally from those here given, and the distribution curves for successive orders of replace- 
ments, as shown in Frisch’s Fig, 3 do not have the property that the j-th seminvariant of the 
fc-th order replacement curve is k times that of the j-th seminvariant of the first order 
curve, except for j = 1. The fact is that Frisch’s curves in his Fig. 3 are all similar, except 
for a constant factor applied to the vertical scale and its reciprocal applied to the horizontal 
scale In this case all the corresponding seminvariants, except the first, are evidently 
unchanged in passing from one curve to the next. Frisch, as a matter of fact, does not 
introduce seminvariants into his discussion at all The Hertz solution he could not pos- 
sibly introduce, since his fundamental equations are not of a form appropriate for the use of 
the Hertz solution. 

The later sections of Frisch’s paper deal with somewhat more complicated oases, but they 
all involve the assumption of “strict heredity,” that is, the assumption that a unit with 
length of life v is replaced by another having exactly the same length of life v. At any rate, 
that is the understanding I have formed of the Danish text, studied with the assistance of a 
native of Scandinavia All the formulae in the text bear out this understanding. 
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and, generally, for the (j + l)th generation^^ 


(60) 



B,{t — a)<p{a) da . 


Now, by a well-known proper ty“ of the Thiele seminvariants, it follows from 
(58), (59), (60), that the seminvariants of the distribution-in-time of the births 
(or replacements) in the jth generation are simply the j-tuple of the corre- 
sponding seminvariants of the first generation, that is, of (pit). 

Furthermore, it is easily shown that as j, the order of generation, increases, 
the distribution of renewals approaches^* the normal (Gaussian) frequency 
distribution. 

By virtue of these properties the distribution curves for successive genera- 
tions are easily constructed.** 

The sum total of the contributions of successive generations should, of course, 
agree with the expression for the total annual births B{t) at time t given by the 
fundamental equation (9). In point of fact, by summing the left and the right 
hand members of equations (58), (59), and (69) for all generations up to the 
highest, say the n-th, “reproducing” at time t, we find 

(61) B{0 = D 5,(0 = Bx(0 + / Z BiH - a)<p{a) da. 

)-l Jo J-l 

Since the n-th is the highest generation contributing,** the value of the integral 
in (61) is not changed by writing n instead of n -|- 1 as the upper limit of the 
summation sign on the right. But then (61) becomes simply 


5(0 = 5i(0 + j B{t — a)¥>(o) da 


The births in the f-th generation extend at most from i = ja to t = joi, but it is not 
necessary to take this into account in writing the limits of the integrals in (60) and corre- 
sponding equations, because the inclusion or exclusion of vanishing terms in the integrand 
does not affect the value of the integral. Similar remarks apply to the effect of the limited 
range of #>(a). See also footnote 5. 

For details, see A. J. Lotka, “The Progeny of a Population Element,” American 
Journal of Hygiene, 1928, vol. 8, p. 875; also “The Spread of Generations” Human Biology, 
1929, vol 1, p. 305. 

In practice quite rapidly, even if <p(a) is far from normal. 

For the case in which p(o) is a Pearson Type I curve, details of the process are given in 
my paper “Industrial Replacement,” Skandinavisk Aktuarietidshrift, 1933, p. 61. I may 
here remark that such a Pearson Type I curve for the distribution in the first generation 
does not strictly give again a Pearson Type I curve in the second generation, because the 
momenta beyond the 4th are neglected in fitting such a curve. But it must be remembered 
that the same neglect is practiced in the original fit of the data, so that the fit in the second 
generation will in general be as adequate as that in the first, provided, of course, that 
proper attention is paid to Pearson’s criteria. 

** T)ie special case that the limiting n so defined is «> would require special discussion, 
which, however, presents no great difficulty. As this case is of little if any practical im- 
portance, this discussion is here omitted. 
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that is, summation of the contributions of individual generations to the total 
yinmial births, leads us back to the fundamental equation (9), which confirms 
the correctness of our analysis. 


TABLE I 


Age Schedule of Survivorship and of Beplacemenis^^ in First Generation 


Age Interval 

Survivors from Original 
Installation to Beginning of 
Specified Age Interval 

Replacements Within 
Specified Age Interval 

0-1 

100,000 

— 

1-2 

100,000 

— 

2-3 

100,000 

300 

3-4 

99,700 

900 

4-5 

98,800 

1,800 

5-6 

97,000 

3,000 

6-7 

94,000 

5,700 

7-8 

88,300 

10,300 

8-9 

78,000 

14,100 

9-10 

63,900 

13,900 

10-11 

50,000 

13,800 

11-12 

36,200 

13,200 

12-13 

23,000 

10,400 

13-14 

12,600 

6,300 

14-15 

6,300 

3,700 

15-16 

2,600 

2,200 

16-17 

400 

400 

17-18 

— 

— 


5. Application to Kurtz’s data. An extensive collection of numerical data 
(mortality curves) on renewal of industrial equipment has been published by 
E. B. Kurtz (1930), (1931). By way of example the analysis developed above 
has been applied to the data “Group III,” as fitted by him with a Pearson 
Type I curve, namely^® 

(62) - 14,950 (l + (l - “ . 


« Data from E. B. Kurtz, Life Expectancy of Physical Properly, 1930, Table 22, Cols. 6 
and 6, p. 86, and p. 104, Fig. 50. 

The numerical values of the constants in the formula as here given differ slightly from 
those given by Kurtz, perhaps owing to the retention by him of higher decimals in his 
computations. There is also an inconsistency between Kurtz’s use of 10 for the mean in 
his formula, whereas on his drawing the mean is placed at 100. 




16 


ALFRED JT. LOTKA 


The aperiodic component is the number of units originally installed (arbi- 
trarily assumed as 100,000) divided by the mean of the frequency curve (equa- 
tion 62). Following Kurtz, this has also been arbitrarily made equal to 10, 
which simply implies a particular choice of time unit. The fundamental data 
and characteristics are set forth in Tables I and II. The first six oscillatory 
components, were computed retaining moments and seminvariants up to m 4 , 
with the results shown in Table III and in Figs. 2 (right hand panel), 3 and 4. 

TABLE II 


Moments and Seminvariants of Curve of Replacements in First Generation^* 


J 

Moments® w/ 

Seminvariants pj 

0 

100,000 


1 

0 

10 

2 

671,924 

6 7192 

3 

130,070 

-1.3007 

4 

12,323,200 

-12.1228 


TABLE III 

Constants of the Senes Solution {19) of Integral Equation (7) for First Six 
Oscillatory Components Computed from First Four Moments and 
Seminvariants of an Industrial Replacement Curve'^’’ 


Order of 
Component 
n 

u 

V 

0 

H 

0 

H 



0 

0 

0 

10.0000 

0 

.19000 

0 

1 

- .11009 

.57767 

11 1688 

4 1458 

.07869 

02921 

2 

- .30144 

98920 

14 .3353 

7 6696 

05423 

.02902 

3 

- 46500 

1 28383 

18 4982 ! 

10 4425 

04100 

.02314 

4 

- 59500 

1 .51475 

23 1094 

12 7773 

.03314 

01832 

5 

- .69800 

1 .70500 

29 2088 

14.8877 

02718 

01385 

6 

- .78000 

1.86117 

32 5165 

16 7797 

.02429 

01253 


In particular, Fig. 4 shows the curve obtained by the summation of the first 
six oscillatory components superposed over the aperiodic (constant) component. 
It also shows the distribution curves of the first five generations within the 
range of the time scale on the diagram. Summation of these reproduces. 

Data from E. B. Kurtz, Lije Expectancy of Physical Property, 1930, Table 22, p 86, and 
Fig. 50, p. 104. 

“ Moments taken about age 10 
This value of pii is taken with reference to the origin. 

Data from E. B Kurtz, Life Expectancy of Physical Property, 1930, p. 104, fig. 50 
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within the errors of drawing, the resultant curve of the oscillatory solution, 
except for the very early stages of the process, where the oscillatory solution is 
of no practical interest, because the first generation alone dominates the whole 
process, and this is given by the observational data direct or after fitting with 
the curve such as (62). 

It remains to consider briefly the relative advantages of the method of solu- 
tion by differentiation, as originally applied by Herbelot, Risser, and others, 
on the one hand, and the use of the Hertz-Herglotz expansion, as introduced 
for the treatment of this type of problem by Sharpe and Lotka. 

One obvious advantage of the method of differentiation when it is applicable, 
is that the result is obtained in the form of a closed, finite expression for 
each cycle. 

Against this is to be reckoned, first, that the range of application of the 
method is severely limited. Preinreich in a recent issue of Econometrica (1938) 
uses for an illustration of the method a Pearson Type I curve, but in the very 
special and trivial form that the exponents are integers, namely 1 and 2. In 
practice the exponents will always be fractional, and then successive differen- 
tiations do not terminate as obligingly as in Preinreioh’s case. As already 
noted, Preinreich, though citing Kurtz’s observational data on industrial re- 
placement, discreetly abstains from using these for his numerical example. 

Secondly, the disadvantage of a solution in form of an infinite series is more 
apparent than real. In practice the first few terms of the series obtained by 
the Hertz-Herglotz method will usually give an adequate representation of the 
facts, except for a short period immediately following the first installation. It 
is true that here this method, unless carried to high order components, may 
give an imperfect representation of industrial replacements, and may, in fact, 
give impossible negative values in this region, as in the example exhibited in 
Fig. 4. But this is practically unimportant, because in practice there will 
actually be few, if any, such very early replacements in an installation of finite 
dimensions. In fact, second and higher order replacements immediately after 
first installation are obviously out of the question in practice. For example, it 
may well happen once in a while that a telegraph pole is demolished on the 
very first day of service by collision with a truck. It is even imaginable that 
its replacement, put up the same day, might again be immediately demolished. 
But .even in a country-wide installation one would hardly expect a third, fourth 
or fifth replacement to be required on the day of installation. In other words, 
that part of the replacement curve which relates to the very early period after 
first installation, is composed practically of first replacements only. 

So for example in the diagram. Fig. 4, the curve of total replacements, up to 
about « = 8, is simply the curve of first replacements, which is given directly 
by the data of the problem. Within the range of errors of drawing the influence 
of higher components are quite unobservable in this region. 

The case is even more favorable in the application of the method to the 
problem of population growth, for here there is actually no reproduction what- 
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ever until age a (say about 15) is reached. The part of the curve defined by 
the series (19) carried only to a finite number of terms,*® and applied to values 
oii < a, is therefore simply rejected.**® It may save many words of explanation 
if the reader is simply referred to Fig. 4 on p. 897 of my previous publication 
“The Progeny of a Population Element,” which illustrates the point, the 
minimum age of reproduction being just short of 16. 

A major disadvantage of the method by differentiation is that it demands 
that the frequency distribution function ^(o) be given in the form of a suitable 
analytic expression, or if it is not so given, that a suitable function or curve be 
fitted to it. The Hertz-Herglotz method, on the contrary, is directly applicable 
to the raw data, regardless of their form. Incidentally, curve fitting as practiced 
by Kurtz may produce a singular result. In 6 out of 7 of his types, the fitted 
frequency curve extends into negative field, implying that there are some 
replacements even before the actual installation. This may not be a very 
serious defect if the area of the curve in the negative field is negligible, but it 
should not pass ilnnoticed. 

One of the principal merits of the Hertz-Herglotz expansion is that it renders 
the course of events over their whole extent, and, in particular, makes clear 
the mode of approach to the ultimate state represented by the aperiodic term. 
Because the method by differentiation requires a separate expression for each 
cycle, it is at best ill adapted to present tp the eye or to the mind a compre- 
hensive view of the evolution of the aggregate as a whole. 

In the introductory paragraphs it was pointed out that the problems of popu- 
lation growth and those of industrial replacement were closely analogous, though 
there were certain points of difference. It is of interest here to give considera- 
tion to these differences. 

One of these has already been noted. Replacement of industrial equipment 
may begin from the very moment of first installation, since accident as well as 
wear and tear must be provided for. Organic reproduction, on the other hand, 
does not occur immediately after birth. One result of this is that for any finite 
value of t, the number of generations contributing to the total births is itself 
finite; on the contrary, in the case of industrial replacement, if we interpret the 
equation (7) literally, there are at any moment an infinite number of genera- 
tions contributing. In practice this, of course, does not occur, and the equation 

There are, of course, limitations to the application of the solution (19). No one with 
any experience in the treatment of practical problems by mathematical analysis would 
think of fitting, by means of a reasonably limited number of terms, the first phases of the 
processes here discussed, in the case of a rectangular distribution of the first generation, 
for example. But the distributions with which we are actually concerned in practice are 
far from rectangular. Such as they are, they are well adapted to the method, as is seen in 
the two examples illustrated. 

“ There is nothing unusual in this rejection of negative values of the frequency function 
where it falls outside the range of actual values. It is what we all do in using such a fre- 
quency curve as Pearson’s type I, defined by a function which becomes negative outside 
the range of actual interest. 
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does not truly represent the facts in that a continuous distribution is assumed 
throughout, whereas for the higher order replacements ultimately the early 
frequencies are so thinned out that the discreteness of the units can no longer 
be disregarded. 

Nevertheless, from the very start we must be prepared to consider several 
generations of replacement as contributing to the total; this lends a certain 
special interest, in dealing with the first cycle of replacements, to the method of 
solution by differentiation, as used by Herbelot, Risser, Zwinggi, Schulthess, 
and lately Preinreich It is true that this interest is much diminished by the 
limitations in the applicability of the method. 

On the other hand, in the case of organic reproduction, for the early part 
of the first cycle, the progeny of a population element belongs exclusively to a 
single (“first”) geneijation. Between t — 15 and t = 30, in our example, only 
first generation births are taking place, and here the solution (19) is of more 
theoretical than practical interest, since the distribution of births is simply that 
of the first generation births. 

Another point of difference is that the curve of <p (a) in the case of industrial 
replacement, if we may judge by Kurtz’s data, is a comparatively well behaved 
Pearson type curve. On the contrary, the corresponding curve of organic re- 
production is a very inconvenient type to fit by any of the standard methods. 
In view of this it is all the more remarkable that the solution (19) gives as good 
a fit as it does with only four components, as will be seen on referring to my 
original publication, “The Progeny of a Population Element,” p. 897, Fig. 4, 
already referred to. 

Lastly, while the analogy is exact so long as we are dealing with industrial or 
organic aggregates maintained at a constant level, an essential difference arises 
when the case of a growing aggregate is considered. Organic growth takes place 
by what might be called “multiple replacement,” that is, one individual in the 
course of life gives rise, on the average, to n individuals, where n may exceed 
unity. Analytically this finds expression in that 


/' 


p{a)m{a) da > 1 


and the fact is automatically taken care of in the solution (19) by the fact that 
in such a case the single real root r > 0. 

Growth of industrial equipment, on the other hand, takes place by ne\y units 
being installed in addition to replacement of disused units. The fundamental 
equations must be altered accordingly to take care of this case. 

In conclusion I want to make a remark regarding the function of such analyses 
as the one here presented. In this connection I can do no better than to quote 
a sentence from Cournot “Those skilled in mathematical analysis know that 


A. Cournot, Researches into the Mathematical Principles of the Theory of Wealth, trans- 
lated by N. Bacon, Macmillan Co., 1897, p. 3 
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its object is not simply to calculate numbers, but that it is also employed to find 

the relations between magnitudes . ...” 

It is essentially in this sense that the analysis of a problem of industrial re- 
placement is here offered. If we are merely interested in numbers, the direct 
arithmetical approach as practiced by Kurtz may be as good as any. But if an 
insight into the anatomy of the processes involved, and into their evolution from 
an initial condition to a final state is desired, then the setting up of the funda- 
mental equations, and their solution in exponential series or in other suitable 
analytical form, and a concise expression of the relation between the distributions 
in time of successive generations, or orders of replacenqents, have greatly superior 
merit as compared with brute attacks by arithmetic without regard to mathe- 
matical form. Nor are the systematic relations (in terms of certain seminvari- 
ants) that have been shown to exist between the distribution of successive 
generations to be regarded merely as “short cuts” for their computation, though 
sometimes they may be found convenient in that way. Their real significance 
lies in that they serve to complete for us the analytical picture of the process of 
evolution of the system under consideration. 

Metropolitan Life Insurance Compant, 

New York, N. Y. 
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ON THE MATHEMATICS OF THE REPRESENTATIVE METHOD 

OF SAMPLING' 

By Allen T, Craig 

1. Introductioa. This paper is designed to present certain topics in mathe- 
matical statistics which find application in some of the problems that arise in 
what has been termed the representative method of sampling. 

For descriptive purposes, it seems convenient to consider two aspects of the 
representative method. The first of these may be called the method of pur- 
posive selection. This method can be roughly characterized by saying that it is 
the method employed when the samples are chosen in such a way that each 
sample will possess one or more characters, say certain averages, which are 
identical with the corresponding characters in the population from which the 
samples are drawn. The mathematical conditions which underlie this method 
are rather stringent, and both theoretical and practical investigations seem to 
have proved that in general no great amount of confidence can be placed in the 
results obtained. 

The second aspect of the representative method has been styled the method 
of random sampling. This method can take either of two forms which we may 
call the method of unrestricted random sampling and stratified random sampling. 
The first of these is the classical method of procedure. That is, a sample is 
drawn at random from a given population and on the basis of these data infer- 
ences are made concerning the nature of the population. On the other hand, 
when the method of stratified random sampling is used, the population is first 
separated into a large number of parts, called strata, and the sample consists 
of an equally large number of "partial samples,” each partial sample being 
drawn from a different stratum. It appears, both from theoretical and prac- 
tical results, that this method of stratified random sampling enjoys many 
advantages not shared by the other methods. 

We now turn to the main purpose of this paper, namely that of enumerating 
some of the theorems and methods of mathematical statistics which serve useful 
purposes in this theory. Discussion of how these theorems find application in 
the method itself has been reserved for other participants on this program. 

2. Estimates. From our preliminary remarks, it is apparent that the repre- 
sentative method is much concerned with the problem of estimating certain 


^ Presented, at the invitation of the program committee, to a joint session of the Insti- 
tute of Mathematical Statistics and the American Statistical Association on December 
29 , 1938 . 
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uDkaowiv parameters of a statistical population. On this account, we first con- 
sider the problem of estimates. 

Consider a population with arithmetic mean m and standard deviation ir. 
Let a:i , xa , • • • , x„ , be n independent items drawn from this population and 
let Cl , Ca , • • • , Cn be any finite real constants, not all zero to avoid the trivial 
case. Write y = CiXi + CaXa + • • • + Cn*® • Then the expected or arithmetic 
mean value of y is 

y = E(y) = m(ci -|- Ca -|- • • ■ + dn), 
and the variance of y is 

<rl = B{{y-yn + 

Suppose we inquire into the probability that y will have a value which is within 
a preassigned e of its expected value. To this end, let C be the numerical 
value of the numerically greatest of the set Ci , • • • , Cn , so that Vy < na^C^. 
Then by Tchebyeheff ’s inequality p, the probability that \ y — y \ < e, where « 
is an arbitrarily small positive number, is such that 

or 


In general, this inequality will have little interest. But if C is of the form 

— . «rV 

M/n^ , M independent of n, 5 > 0, then p > 1 — iucreasing n 

the right member can be made as near to one as we please. This means then 
that if we have a population with a finite variance and if we construct a linear 
function of the observations with coefficients of the nature indicated, we can, 
by increasing the size of the sample, make the probability approach one that 
the linear function will have a value arbitrarily close to its expected value.* 

Now suppose that instead of constructing an arbitrary linear function we 
attempt to construct a function which will he an estimate of some particular 
parameter of the population. If the estimate is to be moat serviceable, we 
should like to be able, by governing the size of the sample, to be as certain as 
we like that the estimate will have a value arbitrarily near that of the parameter. 
The preceding discussion shows that we can best achieve this by requiring that 
the expected value of the estimate be equal to the parameter sought. An 
estimate such as that just described is frequently called an unbiased estimate. 
The use of such estimates in statistical problems makes it possible to avoid 
systematic errors in estimating parameters. In general, unique unbiased esti- 
mates of a parameter do not exist. For example, the arithmetic mean m of 

2 Under these conditions, the function of the observations is said to converge stochasti- 
cally to its expected value. 
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the population can be estimated from the sample xi , • • • , a:„ by any one of a 

large number of unbiased estimates such as (aii + *2 + • • • + »«)/ n, (,xi + a:n)/2, 

Xi , and so on without limit. Thus it becomes necessary to make a choice of 

the unbiased estimate to be used. An appropriate criterion is that the unbiased 

estimate whose distribution has the smallest variance is the best to use. The 

^2 

reason for this can be seen by examining the preceding formula p > 1 — . 

For if Ml and W2 are two unbiased estimates of the same parameter and if 

2 2 2 

<rj, < 4 , , thePi in Pi > 1 - and P2 > 1 - we see that 1 - is more 

2 

nearly equal to one than is 1 - ^ . Because of this fact we prefer, at least 

in most problems, to use j/i rather than 1/2 as an estimate of the unknown 
parameter. An unbiased estimate whose sampling variance is a minimum is 
sometimes called a best estimate.* It should not be inferred that the word 
“best” has any implications other than those stated explicitly in the definition. 

The question very naturally arises as to whether we can determine these 
best estimates in particular cases. In general we can not determine them, but 
under certain conditions we can find best estimates if we are dealing with linear 
functions of the observations. A method and the conditions are set forth in 
an important theorem due to Markoff. We now consider his method. 

3. Markoff’s Method. Let there be given n statistical populations with 
arithmetic means mi, rth , • • • ,mn and standard deviations vi , vi , • ■ • , an 
respectively. We assume that no correlation exists between any of the popula- 
tions. Furthermore, suppose that each of the n arithmetic means can be 
expressed linearly in terms of fc unknown, but unique, parameters, say 
Zi , Zf , • • • , Zit . Thus 

mi — aiiZi -|- ajjZj -j- — 01*2* 

mt — OiiZi -h O22Z2 + • • • + flstZfc 

irtn — OnlZl + On2Z2 + • • • + OhnkZk , 

where the o’s are known constants. Likewise, let T be a parameter which is 
expressible linearly in terms of the same h unknown parameters, say T = 
hzi -f 62*2 + • • • + bkgk , where the Va are given constants. We draw a sample 
of n independent items, Xi , X2 , — , Xn , in which one item is drawn from each 

• An estimate of a parameter which converges stochaatioally (cf. footnote (2)) to that 
parameter is called a consistent estimate of the parameter If a consistent estimate has a 
distribution which is normal for large samples and if the variance of that distribution is 
smaller than the'variance of any other consistent estimate which also has a normal dis- 
tribution for large samples, then the estimate is called efficient. It should be observed that 
our definition of best estimate requires an unbiased estimate, whereas consistent and 
efiioient estimates may be biased. 
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of the n populations. On the basis of this sample we seek to determine a set 
of numbers Xi , Xs , • • ■ , X« such that T' = Xia;i + Xaaja + • • • + X„a:„ is the 
best estimate of T. 

Before attempting to find the solution, if one exists, let us first examine the 
mathematical implications of the problem. In the first place, in order that 
parameters zi, ■ • ■ , Zk may exist, it is necessary and sufloicient that the ma- 
trices A and B, where 



Oil 

Ol2 

• • • Ojj. 


Oil 

* 

• Ou 

mi 

A = 

021 

022 

• • • (hk 

and B = 

021 

022 ■ 

• fhk 

mt 


Onl 

a„2 

• • ■ OLnk 


Clnl 

a„2 ■ 

• dnk 

win 


have the same rank. Thus we require that A and B have the common rank R. 
This being satisfied, we note further that if fc > n, there will be infinitely many 
values of the z’s which will satisfy the equations (1). Thus we require in addi- 
tion that k < n. Finally, we note that if the common rank R is less than k, 
there will be infinitely many values of the «'s which will satisfy the system (1). 
Hence we must have R = k < n. 

We now turn to a consideration of the solution of the problem. Whatever 
the values of the X’s, we have for the mean value and the variance of T' 

E(T') == XiMli -b • - • XnWln 

= Xi2ai,2, + •••-!- XnSan;Z; , 

and 

2 2 I 1^22 

Vr' — Ai< 7 i -(-•••+ AnO'n > 

respectively. Since E{T') must equal T as a part of the condition for a best 
estimate, then 

Xi2ai,2,- -!--••+ Xn2fl»/2, = biZi +•••-+- 

identically in the z’s. That is, the coefficients of zi , • • , z* in the left member 
must equal the corresponding coefficient in the right member. Accordingly, 

«uXi -b 021X2 -b • • • + OnlXn = hi 
fflnXi + O 22 X 2 + • • • -b OnaXn = 62 


fflliXi “b a2*:Xi "b • • • + OntXn = 6* . 

If these equaticms are to have solutions for Xi , ■ . ■ , Xn , we must make the 
additional assumption that the matrix C, where 



Ou 

021 • 

• Onl 

hi 

C = 

012 

022 • 

• On2 

b. 


OiSi 

02* • 

* On* 

h 
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has the same rank as the matrix of the coefl&cients, namely R. If this condition 
is satisfied we can write equations (2) in the form 


(3) 


"h • • • "h — ■ ■ ■ — Gnlkn 

"h ■ ■ • "I” O-kh^k — bk — Gi+l.tXi+i — • ■ ■ — GnfcXn 


and solve for Xi , • • • , X* in terms of the a’s, the b’a, and \k+i , • • • , X„ . Here, 
without any essential loss of generality, we take the non-vanishing fc-rowed 
determinant to be that of the coeflS.cients of Xi , • • • , X* in equations (2) . Thus 
for arbitrarily assigned values of X^fi , • • • , X„ , we can compute the values of 
Xi , • • • , X* and these n values of the X’s will give us a T' which is an unbiased 
estimate of T. That there will be, in general, an unlimited number of sets of 
values of the X's is in keeping with our previous observation that unique unbiased 
estimates usually do not exist. 

The next part of the problem will consist in determining which, if any, 
of the above sets of X’s will make <rr< a minimum. We recall that cr* / = 
Xio-i + • ■ • + X* ffl . In a-fi let us replace Xi , • • • , Xj, by their values (in 
terms of X^+i , • • • , X„) which we obtained by solving the system (3) . Then 
ffy. will be expressed in terms of ci , • . . , cn , the a’s, the b’s, and X^+i , • • • , X„ . 
We next take the partial derivative of <r%> with respect to each of X^+i , • • • , X„ . 
On equating these partial derivatives to zero we will have a system of n — A: 
linear equations in the n — A: unknowns Xi+i , • • • , Xn . If these equations 
yield unique values for X*+i , • • • , X„ , they will in turn determine unique 
values of Xi , • • • , Xfc . This gives us a unique set of X’s such that at one and 
the same time 

E{T') = T and vt' is a minimum. 


The procedure which we have just outlined is most tedious to carry out in a 
particular case. Because of the insight of Markoff, a much better scheme is 
available for finding the best estimate of T. Consider the function of 


; • • ■ , 2* , 






a«2i — 


Evaluate — , 

dZi 


’ez* 


a,kZk ^ 


and equate these partial derivatives to zero. 


the following system of k linear equations in the k unknowns Zi , 




2 


This yields 

• * ) JSfc , 




+ 


+ 2* 23 


ojjk 



( 4 ) 
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If the system (4) yields unique values for the g’a, these values, when substituted 
in T, yield exactly the same estimate of T as was found by substituting for 
the K’s in T'. 

Perhaps an illustration will make this clearer. Suppose we have n = 2 
populations and that the means mi and nw are expressible linearly in terms of 
fc = 1 parameter Zi . Our equations (1) become 


(10 


mi ~ ctiiZi 
mj = ojiZi . 


Similarly, we have T = biZi and T' = Xi*! + . We first determine the 

X’s such that T' is the best estimate of T. In accordance with the preceding 
steps, equations (2) become 

(20 uiiXi + ujiXj = hi, 

and the system (3) becomes 


(30 

Then 


.. bi — a2iXs , „ 

Xi = , ail 9^ 0. 

an 


2 , 2 2 , -a 2 

CTt’ ~ \llTi -T Stffi 

(hi — OsiXaV 2 , , 2 

= \n^) 


because of (30- Thus 

9(7 j.; — 2(121 (hi — flaiXa)!!! 


9X2 


all 


-f- 2X2(72, 


9 ( 7 * / 

and for a minimum a-l' we write = 0 so that 


9X2 


X2 == 


(hihi ffi 


aa«r? + aWi' 


Since 


then 


Xi — (hi — 021 Xa)/^!! , 


Xi 


hi Oil 


(72 


„2 2 , 22- 
Ouff2 + 021(71 


Our best estimate of T is found from T' and it is 

hi 011 (72 311 + biOna-iXt 


T' = 


„2 2 , 22 

and T an^i 
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By Markoff’s method we would form the functioa 




xi — aiiZiY X2 - (hi si\ 


ffi 


7 


(- 


The system (4) reduces to merely 


(40 


2 I 2 

ttll 0-2*1 + 0210-1*2 

= ' 2 TT 72 r~ ■ 

011 0-2 "r O210-1 


We substitute this -value of zi in T = ftiZi and obtain 


T = 


61 011 0-2*1 + 1>1 021 0-1 2:2 
2 2 , ^2 2 
0(n<r2 “T “210*1 


which is the estimated value T' above. 


0-2 




4. Neyman’s modification of Markoff’s Method. We are indebted to Ney- 
man for a modification and adaptation of the Markoff method so as to make 
the method applicable to some of the problems of stratified random sampling. 
One of his examples will best illustrate the method. 

Suppose that a given population is divided into n strata. Let the jth stratu m 
contain Mj items and let these items be 11,1 , w, 2 , • • • , . The mean and 

the variance of this stratum are then 


— "iTr” ^3^ 

M,k 


and cr j (Ri k 

J]j.j k 


Let T be the parameter T = MiUi + MiUi + • • + M„u„ , so that 
T 

-j- mean of the population, is expressed as a linear function- 

of the means of the n strata. We draw at random a sample of N items, the 
sample consisting of n partial samples, one partial sample being dra-wn from 
each of the n strata. Suppose there are ni items in the partial sample from 
the first stratum, rh from the second, and so on. Thus ni + 212 + • • • + Tin = N 
and the entire sample consists of the n partial samples 


*11 ) *12 } • • • » *iiii 


2/21 j ^22; 9 3<2n2 


^nl f 2 ? n2 j * ' ’ j 3 / nn ^ • 

From these N data we propose constructing an estimate 

T' = Xii*n 4- ■ • ■ + Xini*!*! -!-••■+ X„ia;„i + . . . -f- X„„„*„„„ 
which will be the best estimate of T. Now the expected value of T' is 

E[r] = £[i:Ex,**,t] 

“ ^3 h 

J k 

S “^7 \h } 
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which, by hypothesis, must equal T. Thus 

n ni n 

11 1 

identically in the iZ’s. Hence Si2,(M, - SX^s) = 0 which requires that the 
coefficients oi Hi , Ui Hn must be zero. That is 

y^. Xii == M\ 

1 


^ Xnfc ■ — 


Of course there are infinitely many X’s which will satisfy these equations. But 
we can eliminate all but one set by imposing the condition that cr\r shall be a 
mimTTmm . The algebra of mathematical expectation can be used to show that 


or j> 


! = a' 




which will be a minimum when X) ^X,t — i 23 = 0, j = 1, 2, • • ■ , n. Since 

this is a sum of real squares, each term in the sum must be zero. Thus, 

XjAi = — 23 • Since 23 niust equal M, in order that E(T') = T, then 

% 

M 

— — - which uniquely determines the X’s and hence our best estimate of T'. 
n, 

It is important to observe that Neyman’s adaptation does not assume that 
the various strata are uncorrelated nor that there are necessarily replacements 
after each drawing in taking the sample. 


5. Estimation of Ratios. In certain problems in representative sampling it 
may be necessary to estimate both the numerator and the denominator of a 
fraction, say T/U". If T' and V are linear estimates of T and U then for large 
samples both T' and V' will be approximately normally distributed in most 
cases. Further, if T' and V are correlated, they will usually be approximately 
normally correlated. Geary has proved that if we write 

0+17" 

where a and i are constants and 17' and T' are measured from their expected 
values, then 

^ ^ aV ~b 

\ '\/v^cu' — 2rV<TTi CO' + Or' 

is approximately normally distributed with mean zero and unit variance pro- 
vided a > 3 <tv' . Here r is the correlation coefficient between T' and 17'. For 
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large samples this provides a convenient method of testing the significance of 
the difference between an observed and a hypothetical ratio of two linear 
estimates. 

6. Fiducial Inference. After an estimate of a parameter has been made, it 
is usually desirable to make some inference about the true value of the pa- 
rameter. For many years the concept of probable error was used in this con- 
nection. But the use of the probable error involves the assumption that all 
values of the unknown parameter are equally likely. This assumption is 
questionable and efforts to avoid making the assumption have led to a theory 
called fidudal inference. This method of statistical inference has broad implica- 
tions but limitations on our time do not permit our discussing the topic. At 
the close of this paper, we give certain references to the subject, including some 
of an expository nature. 

7. Conclusion. As stated in the introduction, this paper purports to give 
an exposition of some of the topics in mathematical statistics which find applica- 
tion in the representative method of samplmg. Necessarily considerable 
selection of material had to be made, We believe, however, that the problem 
of the best estimate and an appropriate method of obtainmg such an estimate 
are fundamental, and we hope that our exposition has helped to make clear 
these concepts of mathematical statistics which have proved so useful in the 
representative method. 
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1. Introduction. There are a number of fields in which experimental data 
cannot be treated with any success by means of the usual “Student’s” test and— 
very probably— by means of the more general analysis of variance s-test of 
Fisher. It is known in fact [1] that the t-test, as applied to two samples, is 
only valid when the populations from which the samples are drawn have equal 
variances. As the z-test is of a nature similar to the (-test, with the difference 
that it is applied to detect differentiation in means of more than two popula- 
tions, a similar conclusion seems very likely. Thus, whenever we have to 
compare means of populations with distinctly different variances, we have to 
look for some new tests. It may be useful to mention at once two instances 
in which the situation mentioned actually arises. 

As a first instance we may quote certain entomological experiments. Suppose 
it is desired to test the efficiency of several treatments intended to destroy 
certain larvae on a field. The experiments are arranged in the usual way. 
The treatments compared are applied to particular plots with several replica- 
tions and then the plots (or smaller parts of them) are inspected and all the 
surviving larvae are counted Thus the observations represent the numbers 
of surviving larvae in several equal areas. It happens frequently that, while 
there is room for doubt as to whether there is any significant difference between 
the average number of survivors corresponding to particular treatments, there 
is no doubt whatever that the variability of the observations differs from treat- 
ment to treatment. 

We have another sinfilar case in bacteriology. The experiments I have in 
mind consist in determining the bacterial density by the so called “plating 
method.” This consists in taking a number of samples of the analyzed liquid 
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and in spreading them separately on Petri plates. After a suitable period of 
time a number of colonies appear on the plates and their numbers represent the 
observational figures. I am informed that the variability of such observations 
does not depend very much on the technique of mixing the liquid and of taking 
the samples— when this technique is on a proper level— but does considerably 
depend on the kind and on the number of bacteria present in the liquid. 

The above examples justify an effort to find some new and more appropriate 
test. The first step in this direction must consist in an analysis of the ma- 
chinery behind the observable distributions and in deducing their analytical 
form. Once this problem is solved and repeated comparisons show a satis- 
factory agreement between the theory and the observation, we may proceed to 
the next step and deduce the appropriate tests. 

The purpose of the present paper consists in deducing a family of distribu- 
tions which provide a reasonably good fit in several cases in which they have 
been tested. It may be hoped that they will prove satisfactory also in many 
cases m the future. 

2. Distribution of larvae in experimental plots. When the problem of the 
distribution of larvae in experimental plots first arose, attempts were made to 
fit the Poisson Law of frequency These attempts, however, failed almost in- 
variably with the characteristic feature that, as compared with the Poisson 
Law, there were too many empty plots and too few plots with only one larva. 
A similar circumstance is frequently, though not so regularly, observed in 
counts of microorganisms in single squares of a haemacytometer. These facts 
suggest that the distributions considered belong to a class which P6lya [3] 
proposed to call “contagious” : the presence of one larva within an experimental 
plot increases the chance of there being some more larvae. And it is not diffi- 
cult to see the cause of this dependence. Larvae are hatched from eggs which 
are being laid in so-called “masses.” After being hatched they begin to travel 
in search of food. Their movements are slow and therefore, whenever in a 
given plot we find a larva, this means that the mass of eggs, from which it was 
hatched, must have been laid somewhere near, and this in turn means that we 
are likely to find in the same plot some more larvae from the same litter. Of 
course, there may be also others coming from other litters, too. 

A similar explanation may apply also to microorganisms counted in single 
squares of a haemacytometer or to colonies on parallel plates. However, here 
the situation does not seem as clear as in the case of larvae. As far as the 
haemacytometer counts are concerned, also another cause of contagiousness 
may be suggested. Witnessing once the process of preparation of the experi- 
ment, I noticed that, immediately after the drop of liquid was deposited into 
the chamber of the haemaesdometer and for some time after, the positions of 
cells seen under the microscope were not fixed. Some of them seemed to lie 
on the bottom and the others were floating downwards in an irregular move- 
ment. Trying to follow the movements of particular cells I had the impression 
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that they were slightly attracted by the cells already stationary or semi-sta- 
tionary on the bottom of the chamber. If this impression of mine is justified, 
then the attraction of the floating cells by those already on the bottom could 
explain the contagiousness of the resulting distribution. It is known, how- 
ever, that this contagiousness is always rather small and that frequently the 
distribution of cells in the squares of the haemacytometer does follow the 
Poisson Law very closely. 

Owing to the fact that the cause of the contagiousness of the distribution 
of larvae in experimental plots is clear, we shall deal primarily with the distri- 
bution of larvae. Consequently, if the theoretical distributions that we shall 
deduce fit the empirical ones, we shall be more or less justified in assuming that 
we guessed the essential features of the actual machinery of movements of the 
larvae. On the other hand, if the same theoretical distributions appear also 
to fit satisfactorily the empirical counts of bacteria then in respect of these 
applications it will be safer to consider that we were lucky enough to find a 
sufficiently flexible interpolation formula 

After those preliminaries we may proceed to a more accurate specification 
of the conditions of the problem considered. The experimental plot in which 
the larvae are counted will be denoted by P. We shall make no restriction 
as to the shape of this plot, but we shall assume that its area, which we shall 
take as vmity, is small compared with that of the experimental field, P. The 
latter will be assumed to possess M units of area. We shall further assume 
that the moths lajdng eggs on the field F select spots for this purpose in a purely 
random manner. This presupposes that the experimental field is uniform in 
many relevant respects, e.g. is sown in all its parts by the same kind of plant, etc. 
Denoting by f and v the coordinates of the mass of eggs laid by some particular 
moth on the field F, we shall treat them as random variables with the elementary 
probability law 

(1) P({, ’J) = 

everywhere within F and zero elsewhere. After the larvae are hatched from the 
eggs there will be some mortality among them. Let us denote by n the number 
of larvae hatched from the same mass of eggs, surviving at the moment when 
the counts are made. We shall treat n as a random variable and denote by 
p{n) its probability law. At the present moment the writer has no information 
as to what may be the nature of the function p{n). Consequently it will remain 
in our calculations in its general form and, wishing to obtain some formulae 
for immediate calculations, we shall have to substitute for p(n) hypothetical 
formulae which, on intuitive grounds, may seem plausible If the larvae counted 
are all more or less of the same age, there is a possibility that p{n) does not differ 
very much from the Poisson Law, but this point might be verified experimentally 
and we shall not insist on its being necessarily true. 

Consider now a single larva, survivor at the moment of observation, which 
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was hatched out at a point with coordinates f and rj. Denote by x and y the co- 
ordinates of this larva at the moment of counts. We shall consider x and y as 
random variables. It is obvious that the probability law of x and y must 
depend on the values of ^ and rj. We shall assume that the dependence is 
of a particular character; namely, that the probability law of x and y given 
{ and is a function of the differences x ~ ( and y — y. We shall denote it by 
fix - 2/ - y)- 

There is very little that we may consider as known about the function f(x ~ 
y — y). It may he treated as describing the habits of travelling of the larvae. 
There are some indications that there are certain directions in which the larvae 
tend to travel rather than in others, but they are too vague to be taken into 
consideration. Only one thing is certain: during the period of time between 
the birth of the larvae and the moment that the counts are made the larvae are 
able to travel only at some^ limited distance. Consequently we shall assume 
that for sufficiently large values of j a; — ^ ] and \ y — y\ the function f{x — J, 
y — y) is identically zero. Otherwise we shall not mdke any further assumption 
concerning /(a; ~ ^,y — y), and it will remain arbitrary in our calculations until 
we reach the final general formula. 

While abstaining from making arbitrary assumptions concerning the habits 
of single larvae, we shall make one concerning the habits of several of them. 
This assumption, however, seems to be very plausible. We shall assume that 
the larvae have no social instincts, so that the random variables x and y cor- 
responding to one larva are independent from those corresponding to any other 
— that is to say, apart from the possible dependence on the same pair of ^ and y. 

Denote by N the total number of masses of eggs laid on the field F and let 
fc, be the number of larvae hatched from the t-th mass of eggs, surviving at the 
moment of observation and present within some particular experimental plot P. 
Rnally let 

( 2 ) 

be the total number of larvae to be found within this plot. Our purpose will be 
to use the above hypotheses in order to determine the probability law of X. 
In doing so we shall first find that of any of the kiS. Obviously, when con- 
sidering just one variable In , it would be useless to retain the subscript i, so 
that below we shall write simply h to denote the number of living larvae, to be 
found within P , all of which were hatched from the same mass of eggs, situated 
at some point (^, y). 

Let us first write the expression for the probability that one particular larva 
of that group will be found within P. This probability will be a function of 
f and y only, say 

•P(^) i) = j -- y ~ y) dxdy. 
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Given that the number of survivors of the mass of eggs of the point (f, v) 
is n, the probability that exactly k of them will be found within P will be repre- 
sented by the binomial formula, say 

(4) P{k\n,^,r,} - ,)(! - P(|, 

It will be noticed that in writing this formula we use the hypothesis that the 
larvae have no social instincts. 

Multiplsdng (4) by the probability law of f and •>}, and integrating with respect 
to those variables over the whole field F, we shall obtain the probability, F[k\n] 
that out of the n survivors of a mass of eggs, laid anywhere within F, exactly k 
larvae will be found within P : 

Multiplying this result by pin) and summing for all values of n, we shall 
obtain the absolute probability of k having any specified value 
However, before doing so, we must use the hypothesis about the function 
i{x — y — 1?) to deduce certain consequences concerning the integral in (6). 

Originally we did not make any assumption as to the origin of coordinates 
on the field F. It will be now convenient to assume that it is located somewhere 
within the experimental plot P, for example in its center or in any other easily 
specified point. Owing to the particular property of the function/(a: — ^,y — n) 
it will now follow that, for sufficiently large values of f and 17, the probability 
P($, 7)) will be equal to zero. Let us denote by A the part of the experimental 
field where P(?, rj) > 0. Obviously A denotes the set of points, a, in P such that, 
if a mass of eggs is laid in one of them, the distance of a from the plot P is not 
too large for the larvae hatched in o to reach the plot P before the moment of 
observation. Obviously also the plot P is included in A, Consequently the 
area of A, to be denoted by the same letter A, must be greater than unity. 
Owing to the lack of any precise knowledge of the nature of the function /(a — ?, 
2/ — ij) it is impossible to say anything about the shape of A. 

Let us now turn to the integral in (5). The function under this integral 
changes its form according to whether the point (^, 7?) is within or without A. 
If fc = 0, then the integral in (5) reduces to 

(6) { I r,)yd^dv ^M- A + j Pd, r,))"dgd7,. 

If however fc > 0, then 

(7) j fp%v){i-pi^,v)r~^d^dr, = I j|pds,4)(i - 

Now we can write 

(8) P{A} = £ p(n)P{A: |nj, 

na:0 
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which gives in particular 

(9) = 1 - ^ ^ 2) 

and for fc > 0 

<“> = i / / .?. sCTi ”>» - '’«■ 

This is the general form of the probability law of fc, which involves two un- 
specified functions p{n) and P(?, rj). We shall not analyze it but proceed to the 
calculation of the characteristic function of k, which will then be used to 
calculate that of X. We have 

(11) = E e'‘'‘P{k] 

k^Q 

or, using (9) and (10), and after easy transformations 

(12) Mt) = 1 - ^(l - 2 Jl E p(n)(P(^, vy‘ + 1 - P(f, v)r dk d^) . 

Owing to the assumption that the larvae have no social instincts all the 
variables h, k 2 , • • ■ k^r in (2) must be considered as mutually independent. 
As the characteristic function of any of them has the same form (12), the char- 
acteristic function, <^>x(t), of their sum, X, will be represented by the iVth power 
of the expression (12) Denoting by m the average number of masses of eggs 
per unit of area of the field P, so that N — Mm, we shall have 

1 /I „?o ^ 

This will be the characteristic function of X for any value of M If it is desired 
to put into effect the assumption that "M is large”, we shall have to consider the 
limit of (13) for Af — > w . This will be denoted by 0(t) and we shall have 

(14) .pit) = exp|- Am(l -jfl ]2p{n){P{^,v)e‘ + 1 - P(?, r,))"dSd„)|. 

In order to obtain the numerical value of the probability of X having any 
specified value X', it remains only to specify the functions p(n) and P(f, p) and 
to use the familiar formula 

(15) P{X = X'} = 1 .^(Oe-’'^' dt. 

3. Particular classes of the limiting distribution of X. Until we have some 
experimental evidence as to what might be the nature of the two functions 
pin) and fix — y — p) or P($, p), we may try a few guesses. If the results 
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obtained in this way agree with empirical distributions, we shall have some 
reason to think that the guesses are not altogether wrong 
In certain cases aU the larvae considered are at the moment of observation 
approximately of the same age. Alternatively, we may count only larvae 
which are at the same stage of development. With such counts it is not un- 
reasonable to try for p{n) either the binomial or the Poisson formula. Either 
of them will lead to easy calculations of (14) Writing 

(16) pin) = e"’' 

with X representing the average number of survivors at the moment of observa- 
tion per unit mass of eggs, we shall get for <^{t) the following expression 

(17) (^(0 = exp I— Am(^ J J 

Substituting here for P(^, ij) any suitable function we shall obtain a cor- 
responding particular form of the characteristic function (pit), so that (17) 
determines a whole family of distributions. Substituting in (14) instead of (16), 
say the binomial formula, we shall obtain another family of contagious distri- 
butions. 

Strictly speaking, in order to obtain some particular distribution from the 
formula (17), we have to specify the function fix ~ S, 2 / — 17 ), then to calculate 
P(|, 17 ) and substitute it in (17). Since however we have no knowledge of the 
properties of /(x — y — v) and have to select it only on intuitive grounds, 
we may as well select the function P(|, y). It may be selected either by itself 
directly, in which case there will be no difldculty in substituting it in (17), or by 
some indirect method In the other case we may find it more convenient to use 
another form of (17) which is obtained by expanding the exponential under the 
sign of the integral in (17) and by integrating term by term, which is obviously 
permissible. In this way we get 

(18) log<i!>(0 = Am ^ — — A p„. 

n-i nl 

Where P„ stands for the expression 

( 19 ) Pn = jjjri^,y)dHy 

and has the form of a moment of nth order of a certain probability law which 
it is easy to determine 

We may consider for a moment the value of P(^, y) as a random variable Z. 
Its values cannot exceed the limits, zero and unity. Let z be any number 
between zero and unity and denote by AP(g) the measure of the set of points 
belonging to A where P(?, v) < z. Then the function F(z) will possess all the 
properties of the integral probability law of a variable Z which we may identify 
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with P(f, v) and the integrals F„ will be simply the moments of Z namely, 

P„ == I z’’dF, where, of course, the integral would be considered in the sense 
Jo _ , 

of Stieltjes. It is interesting to notice that Pi is always equal to A , To see 

this consider the integral 

(20) APi = J j^Pi^,-n)d^dy, 

and substitute in it the expression of P(|, ij) in terms of the function /(a: — f, 
y - n). We get 

(21) APi = j j d^dv j j f(x — k,y - n) dxdy 


( 22 ) 



^,y - ri)dxdyd^d't). 


Where the four-dimensional region of integration W is defined as follows, (i) 
The variables x and y vary so that the point having them for its coordinates 
may have any position within, but cannot be outside, of the experimental plot P. 
(ii) When x and y are fixed in the above way, say x = x' and y = y', then S and t? 
may assume all those values for which the function f{x' — ^,y' — v) is positive, 
Let us denote this system of values of $ and v by P(x', y'). Then we can calcu- 
late APi as follows 


(23) 


AP,= 


[ f dxdy [ f 
J Jp J Jb{^.v) 


/(® — y ~ v) d^ dy. 


Now it is easy to see that the second integral in (23) is always equal to unity, 
whatever be x and y satisfying (i). To see this we have to recall the funda- 
mental property of the function f{x — I, y — y), due to the fact that it is the 
elementary probability law of x and y, namely that if ^ and y are fixed in one 
way or another, and it is integrated with respect to the other pair of variables, 
over all their values for which it is positive, the result will be equal to unity. 
In particular we shall have 


(24) 


IL 


fiu, v)dudv = 1. 


Consider now the second integral in (23) and make the substitution 


(25) 


i = X - u, n = y - V 


so that, instead of ^ and y we shall now integrate for u and v It will be seen 
that the result of this substitution is exactly the integral (24), equal to unity. 
Since it was assumed that the area of P is equal to unity, it follows that APi = 1. 
This equality is thus the necessary condition that the function P(f, y) must 
satisfy. Besides, being a probability, it cannot be negative and cannot exceed 
unity. Whether any function having these properties may play the r61e of 
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P(|, tf) must be left for further inquiry. Assuming temporarily that this is so 
we can tentatively specify the probability laws belonging to the class determined 
by (18) by substituting in (18) instead of the Pn’s the corresponding moments 
Mn of any distribution function F(z) with its range between zero and unity, 
remembering only the interpretation of its first moment that we have found 
above, namely Mt = Pi — A~^. 


4. Certain general properties of the distributions deduced. Using the above 
result, we may substitute it in the formula (18) and get 

(26) log ^(t) = mX(e“ -1) + Am 'E P„. 

n-2 n\ 

Owing to the fact that the first term in the right hand side, mX(e“ — 1), repre- 
sents the logarithm of the characteristic function of the Poisson Law, 

(27) p(x) = e-’”" 

Xi 


for a: = 0, 1, 2, • • . the formula (26) is especially interesting. Comparing the 
formulae 



we see that 0 < P„ < so that AP„ < 1. This circumstance assures the 
absolute and uniform 'convergence of (26). Frequently the higher moments 
P„ will be much smaller than the first, Pi , and if this tends to zero, all the 
products AP„ for n > 2 will do so too. In those cases log will tend to 
mX(e’‘ — 1) uniformly for all values of t. To see this take an arbitrary e > 0 
and select N so large that 


(29) 


QO 

m E 

n=/f+l 


(2X)" 

n! 



Next let Ao be large enough for 


(30) 


AP„<~e 

2m 


-2X 


for all n = 2, 3, • • ■ N and for any A > Ao . For such values of A we shall 
have 


(31) 




n\ 

< Am 


(s'-i 


^ x"|<s’' 


n\ 


J-Pn+ £ 

n=JV+l 


n! 


ACp.) 


<6 


independently of what is the value of t. This result may be formulated as 
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Pboposition I, If the paramei&'s m and X remain constant hut the probability 
law F(z) is changed so that all the products APn tend to zero for n = 2, 3, • • ■ , 
then chit) tends to mX(e’' - 1) uniformly for all values of t and, consequently, the 
corresponding probability law of X tends to that of Poisson, given by (27). 

The above proposition may be considered as an explanation of the circum- 
stance that occasionally the distribution of larvae may be very close to that of 
Poisson. This may happen for instance when the larvae that we count are 
sufficiently old and have had a sufficient time to travel very far from the spot 
where they were hatched. In such cases A will be large and, if the function 
f(x — y — rf) has some appropriate properties, all the products AP„ may be 
very small. But it is interesting to notice that there is a possibility of A in- 
creasing without the products AP„ tending to zero. Such will be for instance 
the case if P( J, ij) could have within A only two values Bi{A) and B^iA) changing 
with A, one close to unity and the other close to zero If Ap and Aq are the 
areas of the parts of A where P(|, ri) has those two different values, then we 
shall have 


(32) 


and 

(33) 


fPi = pBAA) -f- qB,(A) = A-^ 

1 p „ = pBUA) + qB.^A) 

AP = 

^ " pBU)A-qB^{A) 


may tend to unity as A is increased. In such cases the probability law of X 
will not tend to (27). While callmg attention to this possibility, it should be 
emphasized that it is not likely to occur in practice. In the cases of discon- 
tinuous F{z) considered below P{X\ does tend to (17). The same is true also 
in such cases where it is assumed that 


(34) ^ = a + bz > 0 for 0<2<c<l 

= 0 elsewhere 

etc. 

Before proceedmg to specialize the expression (26) of the logarithm of the 
characteristic function, we shall show the connection existing between the Pn’s 
and the semi-invariants of X. To calculate the latter it is sufficient to differ- 
entiate (26) with respect to {, to put t = 0, and to divide the result by the 
appropriate power of i. Denoting by jk the Ath semi-invariant, by yi the first 
moment about zero, and by m the fcth central moment of X we easily get 

[mi = 7i = wX 


|U 2 = 72 = mX(l -f AXP2) 

M 3 = 73 = mX(l + 3AXP2 + Ah^Ps) 

Ml - 3 m2 = 74 = m\(l + lAhPk -h eiX^Pj -f Ah^Pt) 
etc. 


( 36 ) 
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It will be seen that, in general, the kth semi-invariant depends on , P3, 
■ • ■ Ph only. Another property of the new distributions that we shall mention 
is that they arc “stable” 

Proposition II. If Xi, Xs , • • • Za are s independent random variables all 

following the same distribution with the logarithm of the characteristic function 

8 

given by (26), then the sum Y = 2 X, mil follow the same probability law loith 

the exception that instead of the parameter m it will depend on the product sm. 

In order to establish this proposition it is sufficient to notice that the logarithm 
of the characteristic function of the variable Y is equal to the expression (26) 
multiplied by s. 

Lastly, it may be noticed that the family of distributions determined by (26) 
is different from the comparable distributions deduced by P61ya ([3], p. 153, 
formulae (40) and (41)). In fact the logarithms of the characteristic functions 
of the latter could be written as follows; 


(36) 
and 

(37) 


— a log (1 — b(e'‘ — 1 )) = abie'‘ — 1 ) -t- a S ^ 

c(e‘' - 1 ) _ c(e“ - 1 ) , c d 

1 - de'‘ l~d 1 - d - dj ^ ’ 


respectively and, even if the formal expansions in powers of («“ — 1 ) converge, 
the identification of those expansions with (26) would require that P„ possess 
values exceeding unity, which is inconsistent with their essential property of 
being successive moments of a positive variable Q < Z < 1. Of course, the 
convergence of (36) and (37) would impose special restrictions on the constants 
that those formulae involve. 


5. Contagious distribution of type A depending on two parameters. The 
simplest assumption that wo can make concerning the function P(g, n) is that it 
possesses some constant positive value within A and is zero elsewhere. Owing 
to ( 20 ) this constant value must be equal to Substituting this in (17) we 

immediately obtain, say 


(38) 


^i (0 = exp 





We could use the above formula directly to obtain the corresponding probability 
law. But before doing so, it may be useful to illustrate the machinery of the 
alternative method of obtaining the characteristic function of X and to calculate 
the same formula using (26) 

If P(^, 17 ) is equal to everywhere in A, this means that the function F(z) 
is a step function, which is equal to zero for any 2 < and is equal to unity 
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elsewhere. Accordingly we shall have = A Substituting this into (26) 
instead of P„ 'we easily get 

(39) log <#>i(<) = Amie^’ - l) 

which is equivalent with (38). 

We shall now proceed to the calculation of the probabilities P{x = k} as 
determined by either (38) or (39). For this purpose it will be useful to notice 
that the characteristic function (38) depends really on two parameters only, 
which we shall denote by mi and Wa , 


(40) mi = Am, m^ = \/ A 

In order to simplify the printing we shall further denote 

(41) 2 = 


Expanding the two first exponentials of the three involved in (38), we may write 


(42) 






:o k\ 


n^O 


nl 


n 


This is the form of the characteristic function which is the most convenient 
when we have in mind applying the formula (15). In fact, it will be seen that 
we may multiply (42) by and then mtegrate the series term by term. 
Further, it will be noticed that, on integrating between the limits and +ir, 
all the terms of the product will vanish except for the one which is independent 
of t. Consequently, the result of substituting (42) in the right hand side of (15) 
will be the coefficient of e'^'‘ in the expansion (42), so that 

(43) P(Z = fct == 

tC I 1 1 

As it is easy to verify, we have 

(44) P{x = 0} = 6"”“ 
and, for fc > 1 


(45) 


P{X = fc} = 


fcf cftt* 


This formula gives an easy check of the identity E^'{a: = »i}=l- In fact, 

n~0 

the left hand side can be looked upon as a product of by the Taylor’s expan- 
sion of the function differentiated in (45) taken at the point u = mi , which 
gives identically unity. 
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Successive differentiations give in turn 

(46) P {X = 1 } = e-"“ ^ mxe-”'‘ 

(47) P{X = 2} = e-”'''-'""'’’ I? + mi6~n 


etc. Comparing the formulae (44), (46) and (47), the effect of the “contagious- 
ness” of the distribution is easily seen. P{x = 2} differs from what it would 
have been, if the distribution was that of Poisson, by the additional terna 
within the brackets. 

Formulae (44), (46) and (47), and others which could be obtained by differ- 
entiating as indicated in (45), could be used for numerical calculations, How- 
ever, these are greatly simplified by the use of the following elegant formula, 
deduced by Dr. Geoffrey Beall of the Dominion Entomological Experimental 
Station, Chatham, Ontario. 

(48) P{Z = n-f 1} ==^^^5^E^^P{X = n-«l. 

n -T l i=.o c! 


The correctness of this formula may be easily checked by calculating P{X = 
n — <} from (43) and by substituting it in (48). Simple rearrangements will 
■ then give what could be obtained from (43) by putting k = n + 1. 

Substituting Pn = in formulae (35) and taking account of (40), we get 


(49) 

(50) 


ix'i = \m = triirrh 



= mim2(l -|- m 2 ). 


Solving these equations for mi and m 2 we obtain the formulae 
(61) m 2 = (m2 — mO/mi) mi = n'i/m2 

If the moments ^^'l and /12 are determined for an empirical distribution, these 
formulae may be used for estimating mi and m 2 . In cases which were tried, 
this process did give frequently a satisfactory fit. Sometimes, however, when 
the tail of the original empirical distribution was very irregular, this distribution 
was better approximated by calculating the moments and ^2 not from itself 
but after a certain amount of smoothing of the tail. It follows that the method 
of fitting the new distribution to the empirical data requires some further study. 
At present it will suffice to mention that, whenever this distribution was tried 
on distributions of larvae which at the moment of counts were approximately 
of the same stage of development, the fit obtained was very satisfactory. It is 
hoped that a number of actual distributions fitted, together with the description 
of the method of counting, etc., will be soon published by Dr. Beall. As a matter 
of illustration one of his distributions is reproduced at the end of the present 
paper. 
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As for tlie distribution considered we have 
(52) lim APn = lim = 0, ra = 2, 3, ■ • ■ 

A -+00 

It follows from the above theory that, as A -s- w, the probability law (48) tends 
to that of Poisson, namely 


(63) 


lim P[X = n} 

vi-+oo 


mimj (rn^fn^ 
nl 


For this reason the distribution (48) could be perhaps called the generalized 
probability law of Poisson, but it seems that the term “contagious distribution 
of type A with two parameters” will be more descriptive. Further on we shall 
see what is the justification of the description “of type A”. 

It was stated at the outset of the present paper that, when comparing the 
distributions of larvae in two series of plots subjected to two different treat- 
ments, there is sometimes doubt whether the means of those distributions are 
equal or not, while the difference in variability is more or less obvious. The 
formulae (49) and (50) give us the explanation of these facts. It is seen from 
the formula (49) that the mean of the distribution is equal to the product of the 
mean number of masses of eggs per unit of area and of the mean number of 
larvae per mass of eggs surviving at the moment of counts. If the two treat- 
ments compared are of about the same efficiency of killing the larvae, then the 
values of X for each of them will be approximately equal and, consequently, 
we shall obtain about the same values for the two means. But while being of an 
equal efficiency as far as the killing is concerned, the two treatments may annoy 
the larvae in an unequal way, For example if the first treatment is dummy 
(no treatment) and the other is in general ineffective, it may still spoil the taste 
of the leaves that the larvae feed on. In such a case they may be compelled to 
travel a little more than they would otherwise, which will lead to an increase in A 
Looking at the formula (50), it is easy to see that this would lead to a decrease 
in the value of . Alternatively the treatment may produce a temporary 
paralysis of the larvae which may reduce A and bring an increase of fit . 

These remarks were applied to moments (49) and (50) of the particular dis- 
tribution (45), but looking at the formulae (35), it is easily seen that they are 
true in the general case also. 


6. Contagious distributions of type A depending on three parameters. As 
mentioned before, in order to determine some particular contagious distribution 
contained in the class depending on equation (18) it is sufficient to substitute 
in it instead of the Pn the moments of any distribution with its range confined 
to the interval from zero to unity, with the only restriction that the reciprocal 
of the first moment should be equal to A. Obviously this could be done in an 
infinity of ways, all of which will give more or less different results. We shall 
select the following one, representing a natural generalization of the procedure 
adopted above and leading to very simple formulae 



ON A NEW CLASS OS' "CONTAGIOUS” DISTRIBUTIONS 


49 


Formerly we have assumed that P{^, ij) possesses a constant value within 
the whole area A. At present we may assume that within this area it may 
possess one of two (three, four, etc.) values, say Bi and B 2 , . Considering again 
f’(f, 1 )) as a random variable Z, this will be equivalent to an assumption that Z 
may possess only one of the values Bi and B 2 both positive and not exceeding 
unity. Again the probabilities oi Z = Bi are at our disposal. We shall take 
that these probabilities are equal, i.e. equal to 
Comparing these assumptions with what may be the actual situation, one 
may be led to think that they are rather artificial. This however is not so. 
There is no doubt that the value of P(f, rj) does change within A, and it is also 
probable that the change is smooth. As we have no knowledge of the character 
of this function we first take its mean value within the area A and treat it as its 
first approximation. Next we divide the area A into two equal parts, say Ai 
and Ai and so that the greatest value of P( 5 , rj) in Ai does not exceed any of the 
values in At . Then taking the average of P{ 17 } within Ai and a similar aver- 
age within As and denoting them by Bi and Bi respectively, we do obtain a 
better approximation to the actual values of P(^, rf) assuming that it is equal 
to Bi everywhere in A. That is, in fact, the real meaning of the hypothesis 
formulated above and that we are going to accept in the following. 

Denoting again by Mn the moments of Z we shall have 

(54) Ml = UBi + Bi) = A"' 

and generally 

(56) Mn = KPr + BS). 

Substituting (55) in (26) we get, say 

(56) ^ _ 2 ). 

A 

We notice that this expression depends on three parameters, say 

(57) mi = Am, m 2 = XBi , m 3 = XBj . 

In order to get the formulae for the probabilities of X having any specified values 
we could again apply the method used above when treating the more simple case. 
It may be useful however to illustrate a shorter way which easily leads to a 
generalization of Dr. Beall’s recurrence formula As we have noticed before, 
the probability P {X = k} is equal to the coefficient of e’*’‘ in the expansion of the 
characteristic function in powers of e'\ Substituting for simplicity z = e“, 
so that t — — z log z, we may say that, if <t>it) is the characteristic function of a 
variable Xi which is able to possess only integer values, then P{X = 7c} is equal 
to the coefficient of z^ in the expansion of say \p{z) = < 7 >(— z log z) Applying 
this rule to (56) we can write the following expression for the generating func- 
tion ij/{z), 

(58) ^ = Tcl. 
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In other words 
(69) 

(60) 

But 

(61) ^ m + mae"’''-*’ } 

az Z 

= y (say) 

and it is easy to see that generally 

(62) 

(x% 

As the fcth derivative of ^{z) in (60) may be calculated by applying the familiar 
formula for the (fc — l)st derivative of the product <P{i)xiz) in (61), we obtain 

= y (<ix d'‘~V \ I 

,_o 2 Mk\{n ~ k)\ \cfe*’ dz”~*/ | ' 


(63) 


Q 

d2»+i 


f IX - 01 




P{X = k] 


h\ dz>‘ 


k — 1,2, 


Using the formulae (60) and (62) we immediately obtain 


(64) 


P{X = n + 1} 


mi + mf'e'”** 

2(n + 1) fc! 


P{X = n-k}. 


As whenever Bi = Bi and consequently nh — nii, the distribution considered 
now becomes identical with that considered formerly, depending on two param- 
eters only, it is seen that the formula (64) represents a direct generalization 
of the formula (48). For purposes of successive calculation of the probabilities 
it will be probably more convenient to write (64) in the following form 


P{J = n-M} = 


mimje 


(65) 


2(a •+■ 1) a fc! 


ti^P{X = n 


fc) 


+ 


i:^’p{z = w-fc}. 


mimje 

2(n -f 1) (Zi fc! 


This device of finding a recurrence formula for the probabilities will always 
succeed whenever there are no difficulties in finding the value of the nth deriva- 
tive of the function x- 

It may be easily shown that if m and X remain fixed but A tends to infinity, 
1 then the distribution (60) tends to the Poisson Law of frequency. Owing to 
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the general result stated in Proposition I, in order to show this it is only sufficient 
to prove that forn > 2 

(66) lim = lim ^ = 0. 

A~*co A-*oo Jji “P -02 

As both Bi and Bjt must be included between zero and unity and their sum is 
equal to it follows that 

(67) 0 < Bi < <Bi < 2A~'\ 

Therefore 

( 68 ) 0 < AMn < 

A 

and (66) becomes obvious. 

Substituting the values of and Mz instead of Pa and Pz in the general 
expressions (35) of the moments, and taking into account the formulae (57), 
we obtain 

= ?TOi(OT 2 + m 3 ) 

(69) • M2 = imiimz + m3 + m* + ml) 

Ms = ^mi(mi + m3 4" 3(mj + m3) + ml + ml). 

If it is desired to fit the distribution to some empirical one using the method of 
moments, then these formulae could be solved with respect to mi , ma and ma . 
We may proceed as follows. Write 

(70) a = 2/11, 5 = 2(/12 — /ii), c = 2 (m3 + 3m 2 4" 2/ii). 

Then 


(71) 

mi (m2 4~ mz) = a 

(72) 

mi{ml 4- m*) = b 

(73) 

mi(ml 4- m3) = c. 


Multiplying the first of these equations by nh and subtracting the result from 
the second and repeating the same process with the second equation and the 
third, we get 


(74) 


mimz(mz — m2) = b ~ arm 
miml{mz — m2) = c — hmz 


and it follows 


mz 


c — hrm 
b — arm 


(76) 
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or 

b , \ C 

( 76 ) - (m* + mil — mims = - . 

Again, dividing (73) by (71) we get 

(77) (m -h ma)* — Smimt = ^ . 


Multiplying (76) by 3 and subtracting from (77), we obtain 


(78) 

s® — 3bs/a — 2c/a ~ 0, 

where s = mj + ws . 

It follows that 

(79) 


(SO) 

bs ~ c 

tthms = p = 

Oi 

(81) 

mi = — a/s* ” 4p) 

(82) 

OTj = Ks + "s/s® ~ 4p) 

(83) 

wti = a/s. 


Following these steps we finally arrive to the values of all three parameters, 
given by the last three formulae. 

If the values of the moments fi'i, fts and ms were known without error, the 
above formulae would give accurate values of mx , mi and ms . If, however, 
the moments are estimated from a sample, then the reader must be prepared 
that, even if the observed variable follows exactly the law, occasionally the 
sampling errors in the moments will make it impossible to carry out all the 
calculations indicated. Especially this may easily happen when the true values 
of Ml and are equal or nearly equal, so that the empirical distribution is close 
to that given by the contagious distribution with only two parameters. As it is 
seen from (81) and (82), in such a case the true values of s and p must satisfy 
the relation 

(84) s* - 4p = 0. 

However, the sampling errors in the moments will ascribe to the left hand side 
of (84) a value only approximately equal to zero, which may be either positive 
or negative. In the latter case we shall not be able to use (81) and (82) to 
estimate mi and ms. As a naatter of fact, the above circumstance actually arose 
in one ease when it was tried to fit the three parameters distribution to a set 
of data which were excellently fitted by a simpler formula (46) involving only 
two parameters. As mentioned before, the problem of fitting the distributions 
which are deduced here requires further consideration. 
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Looking back on the method by which we have substituted a contagious 
distribution with three parameters mi , m^, niz for the simpler one with only 
two parameters, it is easily seen that it can be carried further leading to distri- 
butions with four, five, etc. parameters. In each case we would mentally divide 
the area ^ 4 . in a number of parts of equal size so that the values of P(|, 17) in the 
first never exceed those in the second, etc. Denoting the average values of 
P(Ji, 17) in those areas by Pi , Pa , • • • , B, , we shall obtain the moments 

( 85 ) = 

r j-i 

substitute them in ( 26 ) and proceed more or less as we did above. All the 
distributions which may be obtained in this way possess certain common traits 
and I propose to call them "of type A”. If the number of parameters in such a 
distribution is sufficiently high, it seems practically certain that the function 
P(?) 1?) will be well approximated and we may hope to get an excellent fit. 
However, if a good fit may be attained only by introducing a great number of 
parameters, it usually means that the method of introducing those parameters 
is not very successful, and therefore it does not seem worth while to discuss in 
greater detail the distributions of type A with the number of parameters exceed- 
ing three. Instead we shall briefly mdicate another class of distributions, built 
on another principle, which may be called of type B or C. 


7 . Contagious distributions of types B and C. As mentioned before, when- 
ever the distributions of type A were tried on data, the character of which did 
not obviously contradict the basic assumptions of the theory (approximate 
equality of age of the larvae), the results were always satisfactory. However, 
our present experience is rather limited and it is well to anticipate the failures. 
We may expect that these will be caused by the over-simplified assumptions 
concerning the function P(^, 17) 

In order to deal with such a case we may assume that for 0 < s < 1 the 
derivative of F{z) exists and is either a linear function of z or is equal to zero. 
Writing p(z) = dF/dz we shall put 

Pi(®) — lA for 0 < z < A > 2 

( 86 ) 

= 0 elsewhere. 


for 0 < z < 3 A ^ 


Alternatively we may write, say 

( 87 ) = "4 

== 0 elsewhere. 
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On the other hand, the moments of p^iz) will be given by 
(89) 


Ik Iff f 

HI n 


{n + l)(n 4- 2) ' 

Substituting these expressions in (26) we shall easily obtain the two new 


forms of the characteristic function of X, say 


log ^ 3(0 = — Ml + mx 


1 ) j 

ni4(e*‘ — 1) ' 


(90) 
with 

(91) mx = Am and nh ~ 2\/A. 

Accordingly, the generating function of the probabilities will be, say 

(92) 1 ^ 3 ( 2 ) = = 22 3"?!^ = 


The distribution determined by (92) may be called of type B. 

Using the moments (89) and substituting them in the usual way in (26), we 
obtain, say 


(93) 


l0g<#>4(t) = -wii + 2ffli 


w?(e’‘ - D® 




with 


(94) 


mi — Am and m 2 = 3X/A. 


The probabilities of X having any specified value will be generated by the 
function, say 

(95) ^ 4 ( 2 ) = = E z"P\X = n). 

n-0 


The probability law determined by (95) may be called of type C. The com- 
parative merits of all those distributions could be judged by comparing them 
with the results of observation. 


8. Illustrative Examples and Concluding Remarks. Any series of positive 
terms adding up to unity may be considered as determining a probability law 
of a discontinuous variable such as the X considered above. When trying to 
obtain probability laws fitting the empirical distributions of some particular 
origin, the distributions of the numbers of larvae in experimental plots, or the 
like, we could really start by considering series of some positive terms each 
depending on one or more parameters, say 

(96) uoimx , m 2 ), uximi , m 2 ), W 2 (mi , nh), ■ ■ ■ , M„(mi , m 2 ), ■ • • 

00 

and having the property that, whatever the values of those parameters, E 

n-0 

Wn(mi , m 2 ) = 1. Studying a considerable number of empirical distributions, 
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we could apply the “method” of trial and error to guess the form of dependence 
of the Unimi , mi) on the m’s so that for a broad class of empirical distributions 
there would be a system of values of the m’s, for which the series (96) would 
satisfactorily fit the data. If we succeed in this task we shall be entitled to a 
considerable satisfaction as the solution that we obtained would permit various 
further studies, e.g. the deduction of tests of significance applicable, or approx- 
imately applicable, in various cases, and so on. 

Looking back at the history of statistics we shall find that the systems of 
frequency curves of Pearson, of Bruns-Charlier and others belong to the class 
of results just discussed They are very important — and this especially applies 
to the Pearson curves — because of the empirical fact, that it is but rarely 
that we find in practice an empirical distribution, which could not be satisfac- 
torily fitted by any of such curves. Consequently, wishing to deduce some test 
applicable in this or that case, we may usefully assume that the basic distribution 
is one of the Pearson system and, owing to the frequently continuous character 
of the connection between the conditions and the final results, our final formula 
will be approximately valid when applied to the data under consideration. 

This point of view is not unfamiliar in pure mathematics. For example, we 
know that a broad class of functions may be approximated with any prescribed 
accuracy by means of polynomials. Wishing to prove a theorem applicable 
to this class of functions, we sometimes start by proving it for polynomials and 
then conclude that it is also true for the whole class. Here the r61e of poly- 
nomials is perfectly analogous to that of Pearson curves and could be described 
as that of good interpolation formulae. 

But the problem of deducing theoretical distributions could be also considered 
from a slightly different point of view. Here again wc require that the theo- 
retical distribution fits satisfactorily the empirical data. But we may legit- 
imately require something else: an "explanation” of the machinery producing 
the empirical distributions of a given kind. I have enclosed the word “explana- 
tion” in quotation marks so as not to suggest that I am attaching to it too much 
importance. Mathematics is always dealing with the conceptual sphere which 
is quite distinct from the perceptual and, at most, admits the possibility of 
establishing some correspondence. Therefore, however hard we try, we can 
never produce anything like a real mathematical explanation of any phenomena 
but instead only some “interpolation formula”, some system of conceptions 
and hypotheses, the consequences of which are approximately similar to the 
observable facts. But this similarity may be differently placed. In the case of 
Pearson’s curves it applies to the shape of these curves and to the shape of the 
empirical histograms. Otherwise it may apply to certain real features of the 
phenomena studied and to some mathematically described model of the same 
phenomena And if the theoretical distributions deduced from the mathe- 
matical model do agree with those that we observe, and if that agreement is 
more or less permanent, we say that the mathematical model has “explained” 
the origin of the distributions. 
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If the problem of deducing interpolation formulae, sufficiently flexible to 
represent adequately a class of distributions, is of considerable interest, then 
that of producing similar formulae but involving an “explanation” of the 
phenomena studied, seems to be still more interesting. Of course, for it to be 
considered as successfully solved, the theoretical distributions deduced must fit 
the empirical ones, of a clearly specified kind, “practically always”. At the 


TABLE I 

Distribution of European corn borers in 
120 groups of 8 hills each, (data pro- 
vided by Dr. Beall), fitted by Poisson 
Law and by type A Law with two 
parameters 


No of 
borers 

Frequency 

Exp. P. L. 

Ob- 

served 

a 

0 

5.0 

24 

22.6 

1 

16.0 

16 

16.7 

2 

25.3 

16 

18.3 

3 

26.7 

18 

16.4 

4 

21.1 

15 

13.4 

5 

13.4 

9 

10.3 

6 

7.1 

6 

7.5 

7 

3.2 

5 

5.2 

8 

1.3 

3 

3.5 

9 

.4 

4 

2.3 

10 

.1 

3 

1.5 

11 


0 


12 


1 


Beyond 


— 

2.3 

mi 

— 

— 

2.178 

m2 

— 

— 

1.454 


.000,000 


.95 


TABLE II 

Distribution of yeast cells in IflO squares 
of haemacytometer observed by “Stu- 
dent” (1907), fitted by Poisson Law 
and by type A Law with two param- 
eters 


No. of 
cells 

Frequency 

Exp P L 

Ob- 

served 

Exp. T A. 

0 

202 

213 

214.8 

1 

138 

128 

121.3 

2 

47 

37 

45.7 

3 

11 

18 

13.7 

4 


3 

3.6 

5 


1 

.8 

Beyond 

2 

' — 

.1 

Wi 

— 

— 

3.605 

m2 

— 

— 

.189 

Px‘ 

>.02 


>.l 


present time we may quote a number of instances whore it was possible to estab- 
lish a mathematical probabilistic model of some class of phenomena determining 
probability laws which fit the empirical distributions with a remarkable accu- 
racy. Perhaps the most important class of these phenomena is provided by the 
Mendelian theory; a number of other examples, although of a lesser importance 
but still interesting, have been mentioned elsewhere [2]. In all of them success- 
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ful checks and rechecks mcrease our confidence that the conclusions based on the 
mathematical model determining the theoretical distributions will satisfactorily 
apply to observational data and also that our interpretations of various constants 
is more or less correct. 

Now, what is the situation with the contagious distributions deduced above? 
They do represent an attempt to give good interpolation formulae involving an 
"explanation” of the observable phenomena, and all the constants introduced 
have meanings which are easy to interpret Owing to the fact that b the 
process of the larvae survivbg and spreading over the field there are certab 
unknown features, the final general formula that we have deduced bvolves 
two arbitrary functions p{n) and P(^, rj). By substitutbg for them any appro- 
priate functions that the btuition may suggest, we can obtab a number of 
distributions, each of which may or may not provide a satisfactory bterpolation 
formula. Whether they do or not, must be empirically tested. 

Up to the present time the contagious distributions of type A were tried on 
12 distributions of larvae and on three distributions of yeast cells b squares 
of the haemacytometer, which did not quite agree with the Poisson Laws. 
The results of these trials were always the same; The type A distribution 
with two parameters provided an excellent fit, wbch was never worse than that 
of the more elaborate distribution with three parameters. This circumstance 
seems encouragbg, but future experience may be less satisfactory and it would 
be very desirable to have some more empirical distributions and checks. 

The foUowbg table gives two empirical distributions fitted with Poisson Law 
and with its generalization, as provided by the type A distribution with two 
parameters. 

REFERENCES 

[1] P. L. Hsu, "Contribution to the Theory of 'Student's' <-Test as Applied to the Prob- 

lem of Two Samples.” Statistical Research Memoirs, Vol. II, (1938) pp. 1-24, 

[2] J. Netman: "Lectures and Conferences on Mathematical Statistics,” published by 

the Graduate School of the United States Department of Agriculture, Washing- 
ton, D. C., 1938. 

[3] G. P6lta: "Sur quelques points de la th6orie des probabilit6s.” Annates de Vinstitut 

Henri Foincark, Vol 1, (1931) pp. 117-162. 

[4] “Student”: “On the Error of Counting with a Haemacytometer,” Biometriha, Vol. 5, 

(1907), pp. 351-364. 


Univbesitt oe Calipohnia, 
Bebkblby, Calipoenia. 



ON CONFIDENCE LIMITS AND SUFFICIENCY, WITH PARTICULAR 
REFERENCE TO PARAMETERS OF LOCATION 

By B. L. Welch 

1. Introduction. The solution of the problem of estimating an interval in 
which a population parameter should lie, by means of what is now often termed 
the fiducial type of argument, dates back to the early writers on the theory of 
errors. However, owing to their lack of "Student’s” z distribution, their state- 
ments were usually only of an approximate character, and, furthermore, the 
logical distinction between the fiducial method and the method of inverse proba- 
bility was never clearly drawn, before R. A. Fisher discussed the subject. It is 
of interest to note how far “Student” himself went in this matter. In describing 
the tables which he gave in his original paper he says;^ 

“The tables give the probability that the value of the mean, measured from 
the mean of the population, in terms of the standard deviation of the sample, 
will lie between - oo and z. Thus, to take the tables for samples of six, the 
probahlity oj the mean of the population lying between — » and once the standard 
deviation of the sample is 0.9622 or the odds are about 24 to 1 that the mean 
of the population lies between these limits. The probability is therefore 0.0378 
that it is greater than once the standard deviation, and 0.0766 that it lies outside 
±1.0 times the standard deviation.” 

It should be noted that "Student’s” z is (i5 — 6)/ s where 6 is the true popula- 
tion mean. His tables tell us that for n = 6, P(z < 1)^ is equal to 0.9622. 
Owing to the symmetry of the i distribution this is equivalent to saying that 
P(z > —1) is 0.9622, i.e. 

p{^^ >-l} = 0.9622. 

This may be transposed to read 

(1) P{0 <« + s) = 0.9622 

which is the statement I have italicized in the above extract, it being there under- 
stood that the mean of the population is being measured from the mean of the 
sample. "Student” therefore makes here what is now called a fiducial state- 
ment. In the next sentence he, in effect, attaches a probability to an interval 
estimate for the population mean. In doing this "Student” was not conscious 
of introducing any new principle, nor does he apply the method consistently 


‘ "Student” (1908). "The Probable Error of a Mean.” Biometrika VI, p. 20. 

* P is used to denote the probability of the truth of the relation in the bracket following. 
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to other problems of estimation. For instance, in discussing the estimation 
of the correlation coeflScient p about the same time, he formulates the problem 
in terms of inverse probability, although he was fully aware of the difficulties 
involved in postulating an a priori distribution for p. 

In discussing the problem of interval estimation more generally, I shall adopt 
some of the terminology used by J. Neyman.’ The sample observations 
xi, Xi, ■ ■ • Xn will be noted collectivefy by E (standing for the “event” point 
when the observations are represented as coordinates in a space of n dimensions) . 
Then if 0 is an unknown parameter, a a fixed probability, and F(E, 6, a) a func- 
tion such that 

(2) P{F{E, 0, a) > 0} = a 

we may obtain an interval estimate for 6 as follows. Let 5(E, a) denote the 
set of values of 0 such that for any d in the set we have F(E, 0, a) > 0. Then 
if we use the notation a) C 0} to indicate that the set S(E, a) contains or 
"covers” the true parameter 0 we shall be able to rewrite (2) 

(3) P[5(E, a) C 0) = a. 

We can then adopt the following rule to obtain an interval estimate for 0 : (a) cal- 
culate from the sample the set 8{E, a), (b) make the statement that S{E, a) 
covers 6. In adopting this rule we shall be right in the proportion a of cases. 

There are, in general, an infinite number of ways in which we can start with 
a statement of the type (2) to reach the statement of type (3). Neyman has 
discussed methods of making the best choice between such statements. His 
approach to this problem may be illustrated by the following example. 

Suppose we have a random sample of n from a normal population with stand- 
ard deviation a and let 

„2 _ S (» - xY 

(n-l) ’ 

and w = range = largest x — smallest x. 

Then we can find a constant such that 

(4) 

and, turning this round, we obtain 

( 6 ) = 

This means that, if we choose a = .99 (say), then we can say that tr is less than 
s/h 99 and in 99% of cases we shall be correct in this statement. 


’ J . Neyman (1937) . ' ‘Outline of a theory of statistical estimation based on the classical 
theory of probability.” Phi. Trans. Boy. Soc A 236, pp. 333-380. 
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Now similarly we can find Ca such that 



and reversing this 

K' <:)-“• , 

This statement is not inconsistent with (5). It means that, if we choose to base 
our rule of estimation always on the range, then in 99% of cases we shall be 
correct in saying that <r < w/cm ■ On the other hand, (5) relates to the conse- 
quences of applying always a rule of estimation based on the standard deviation 
of the sample. Both (5) and (7) are in themselves true statements, but we must 
decide which of them is the better one to use. In certain circumstances speed 
of calculation may be the determining factor, in which case (7) may be prefer- 
able, but here we shall assume that the time spent on calculation is not im- 
portant. 

In making the statement that a is less than some upper limit which is a func- 
tion of the sample observations, we shall, in general, prefer that this upper 
limit be placed as low as possible consistent with the chosen confidence co- 
efficient a. We find, however, that it is not possible to say that, whatever the 
sample obtained, s/6« will be less than w/ca or vice versa. We must, therefore, 
approach the problem from another angle. If a' is a value greater than the true 
standard deviation o- we can theoretically evaluate the prohahility that cr' < s/ha , 
and similarly the probability that cr' <w/ca. We may now express our general 
desire to place the upper confidence limit for a as low as possible in a more con- 
crete form. We may ask that the probability that a' is less than this limit should 
be as small as possible. We find in the present problem that, whatever o-' > c, 
we should include a' in the interval from 0 to s/ha less frequently than we should 
in an interval based on any other statistic. This constitutes an argument for 
using s rather than any other statistic such as w. 

In general, Neyman makes all problems of choosing between alternative 
procedures of interval estimation depend on the probability that the intervals 
include values of the parameter different from the true value, as well as on the 
probability of them containing the true value. This principle of choice does, 
I think, appear reasonable, although its application is not, of course, so straight- 
forward when statistics with properties of sufficiency similar to those of s do 
not exist. It is then necessary to introduce other conditions into the formula- 
tion of the problem. I intend to discuss elsewhere ways in which this has 
been done. 

To summarize, we may say: (a) we can make many true statements of the 
type (3); and (b) if we can agree on certain further properties which these’state- 
ments should possess, we can choose which is the best statement of this type to 
adopt as our general rule for interval estimation. There are certain differences 
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between this approach and that of R. A. Fisher, whose attitude is expressed 
clearly in his contribution to the discussion following Neyman’s paper* “On the 
two different aspects of the representative method.” Fisher says there that: 
“In particular he would apply the fiducial method, or rather would claim unique 
validity for its results, only in those cases for which the problem of estimation 
proper had been completely solved, i.e. either when there existed a statistic of 
the kind called sufficient, which in itself contained the whole of the information 
supplied by the data, or when, though there was no sufficient statistic, yet the 
whole of the information could be utilized in the form of ancillary information.” 
Thus it 'appears that when sufficient statistics do not exist, excepting in those 
further cases where Fisher claims that the problem of estimation has been com- 
pletely solved, he would definitely discourage the use of the fiducial argument 
at all. Neyman, on the other hand, would allow the attempt to obtain interval 
estimates on the Imes described above. Where sufficient statistics do exist, 
the two approaches do not lead to any final disagreement. Neyman, using 
results obtained in the Neyman-Pearson theory of testing hypotheses, is led to 
criteria depending in a particular way on the joint probability law of, the sample, 
and these criteria are seen to involve the sample values only through statistics 
which have been defined as sufficient. One may regard this fact in two ways : 
(a) one may say that because a certain line of approach, which seems intuition- 
ally sound, leads to the use of statistics which have been defined as sufficient, 
therefore this definition of sufficiency is a good one, or (b) one may say that the 
definition of sufficient statistics is fundamental, and that any method of approach 
which leads to their use has thereby obtained some extra support. 

There remains the case alluded to above, where the joint probability law of 
the sample does not depend on the unknown parameter 6 by way of one statistic 
only, but where nevertheless it has been said that the problem of estimation 
has been completely solved. This case will be discussed in the next section. 

2. Interval Estimates of Location. R. A. Fisher has given, as a particular 
example, a case where the unknown parameter is one of location, so that we can 
write 

p(a; 1 6) = <f>{x — 9). 

Now if we have a sample of n from this distribution, the {n — 1) differences 
between successive observations when arranged in order of magnitude will have 
a joint distribution independent of 9. Hence if we denote the sample by E, 
and the (n — 1) differences jointly by 0, we have 

(8) p(E\d) = p{T\C,e)p{C) 

where T is some statistic, such as the mean or median, whose distribution does 
depend on d and may hence be taken as an estimate of 6. We may therefore 


* J. Neyman (1984). J, R. Slalist. Soc. 97, p. 617. 
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read (8) as follows: the joint probability law of the sample is equal to the proba- 
bility law of the estimate in samples of the same configuration, C, multiplied 
by the probability of the configuration, the latter not depending on the un- 
known 9. From this it has been deduced that all the information respecting S 
provided by the sample is given by referring T to the distribution p(r 1 C, fl). 
Fisher,' for instance, says that “in interpreting our estimate (we) may take as its 
sampling distribution that appropriate to only those samples which have the 
actual configuration observed.” Later in the same context he remarks that in 
general, when 0 is a parameter of any type whatever, and not necessarily one of 
location or scaling, if something can be found “corresponding with the con- 
figuration of the sample in the simple case discussed above, . . . one of the 
primary problems of uncertain inference will have reached its complete solution. 
If not, tWe must remain some further puzzles to unravel.” 

It is clear, therefore, that more has been claimed for this method than that it is 
practically useful, or that it yields the best results possible in large samples, or 
that it yields results highly approxinmting the best possible in small samples. 
There is an emphasis here on completeness that leads one to suppose that all 
problems of estimation and testing hypotheses may be answered to the best 
advantage by considering only the distribution of an estimate in samples of the 
same configuration, the estimate thus attaining properties analogous to those 
of a sufficient statistic. That this supposition is not true may be seen by con- 
sidering the following simple example. This example concerns the simplest 
situation that one deals with in the theory of testing statistical hypotheses. 
Its relevance to the problem of interval estimation will, however, not be difficult 
to see. 

Suppose that we have a sample from a population involving only a parameter 
of location 6, and that we wish to test whether 6 is equal to do (say), and that 
besides 0o there is only one value Si (say) which it is possible for 6 to take Sup- 
pose we require to set up a statistical test which will reject the hypothesis 
0 = 00 , in only a small proportion e of cases, when it is true. Many such tests 
are possible, and it is natural to choose from them that test which will lead most 
frequently to the rejection of the hypothesis that 0 = 0o when the single alterna- 
tive 0 = 01 is true. Neyman and Pearson® have shown that the best test from 
this viewpoint is provided by the criterion 


(9) 


j _p{E\ 0i) 
p(E\e,y 


This criterion must be referred to its distribution in all samples when 0 == 0o . 
We must therefore choose a constant J, such that 


( 10 ) 


P(J > /. 1 0 = 0o) = € 


‘ yislier, R. A. (1936). "Uncertain Inference.” Proc Amer. Acad. Arts and Sciences, 
71, No. 4, p. 267. 

* J. Neyman and E. S. Pearson (1932). "On the problem of the most efficient tests of 
statistical hypottieses ” Phil. Trans. Roy. Soc. A 231, p. 300. 
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and reject the hypothesis that d = da when J > J^. This is known to be the 
best test in these circumstances, and we may demand that any other procedure 
which claims to use the data exhaustively should be equivalent to it. Now if we 
decide to use only the distribution of the statistic T in samples of the same 
configuration, we are led to take as the most powerful test based onT \ C one 
which would reject the hypothesis that 6 = Oa when the ratio of p{T \ C, di) to 
f{T\C, da) exceeds a certain value. Now by (8) this ratio is exactly the criterion 
J of (9) above. There is, however, this difference, that J has now to be referred 
to its distribution in samples with the same configuration C as that observed. 
We shall therefore have to choose JtiC) such that 

(11) P(J > e} = e. 

A test, then, which rejects the hypothesis that d = da when J > JJfi) will 
be such that it is the most powerful possible with respect to the alternative 
6 = di , based on samples with the same configuration. However, in actual 
sampling from a population, we derive samples with all configurations, and the 
real power of the test will therefore be measured by 

(12) P[J > J.((7) 1 = I P[J > J.iC) 1 C, da\p_{C)dC. 

This quantity cannot be greater, and will in general be less, than the power^ 
of the other test, viz P{J > J, \ di). (If Jt{C) is the same for all C, and there- 
fore equal to J, , the powers will be equal This will be the case when there is a 
sufiicient statistic for d.) We must therefore conclude that, in relation to this 
simple problem at least, n method which takes account only of distributions in 
samples with the same configuration will not use the data to the best advantage. 

Of course the type of problem to be solved is usually not so straightforward 
as the present one. There wilf usually be more than one value of d alternative 
to 00 , and no uniformly most powerful test will, in general, exist. It is legiti- 
mate, however, to consider the above example, because any procedure claiming 
properties of sufficiency should be able to deal with it in the best possible way. 

An example may make the above points clearer, and will show their relevance 
to the problem of interval estimation. Consider a rectangular distribution 
with mean d, and range from (d — 7) to (0 + 2). Let xi and 2:2 be a sample of 2 
from this population, and suppose we require confidence limits for 0 such that 
the chance of them enclosing 0 is a. 

If we represent xi and X2 as coordinates of a point with respect to rectangular 
axes, the joint probability distribution is constant over a square centered at the 
point (0, 0). This is shown by ABOD in Fig. 1. We have 

, . id - ^ <xi <d + ^ 

(13) p(a;i, X 2 )dxidx 2 — dxidx 2 \ 

[d-i <X 2 <d + l 

’’ Power IS used throughout in the Neyman-Pearson sense, i.e. to denote the chance of a 
test rejecting a hypothesis when a given alternative is actually true. 
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If we write zi = ^(®i ~l" ^*^ 2 ); ^2 — ^^( 3^1 ^i)> ^2 will represent the configurati( 

of the sample, and zi may be taken as the estimate, T, of 0 in our discussic 
above. We can then show that 

( 14 ) p(zi, z^dzxdzi = 2dzidZi, 

( 16 ) v{zijdzi = 2{1 - 2 1 1 ]dzi -I < Zi < I, 

and 

(16) pizi I Z2)dZi = ^ •••^-i + |2'2l <2i<0 + J — |32|. 

^ \ m2 \ 

That these are the correct limits for Zi and Zz may be seen by reference to Fig, 
noting that zi and Zz are constant along lines parallel to the respective diagona 
BD and AC of the square. 




First let us confine ourselves to samples with the same configuration « 2 , 
Then, from (16), we can say that 


(17) Pje - «(§ - 1 22 1 ) < < 9 + a(J _ I I)} = ce. 

This statement is true for given Zz , and will be a fortiori true when this restriction 
is removed. It is equivalent to saying that the chance* of a point falling into 
the shaded area in Fig. 1 is (1 — a), where a denotes the proportion of the 
diagonal AC lidng in the non-shaded area.® Confidence limits for B are then 
obtained by transposing (17), giving 


(18) i"{ 2 i - a(t - I 22 ! ) < 0 < 01 + a(§ - I 22 j )} = a. 


* We are assuming that confidence limits are required such that the chance is 


of S being above the upper limit, and 



of it being below the lower limit. 
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That this is not the best way of constructing confidence limits is seen as follows. 
Let us denote the lesser of Xi and Xi by Xj. , and the greater by Xg . Then if we 
consider the possible values of Xl and Xg which will satisfy simultaneously the 
inequalities 


(19) 


d-h<xz.<d + ^- 
6 — 2 + — I < *0 < 0 ^ 


we see that they lie in the non-shaded area of the square ABCD in Fig. 2 where 

the sides of the shaded squares are ^ . The chance of the inequalities 

holding simultaneously is therefore a. Further we see that these inequalities 
can be transposed to read 


( 20 ) 


Xg — i<d<xi,-i-i when (xg — xl) > — | 

Xi — i + — I < 6 < Xg + i — 


a 

2 


when (xo - Xi) < ^ 


and therefore we can take these to define our confidence limits for 6. 

The intervals defined by the confidence limits in (18) and (20) are equivalent 
in the sense that each covers the true value of 0 in a proportion a of cases. To 
decide which is the better rule of interval estimation we shall follow Neyman, 
and consider how often the intervals cover values other than the true 9. In 
particular let (9 + A) be any other value, and consider the expressions Pi and 
Pa where 


(21) Pi = Pj^i — — 1 02 1 ) < (0 + A) < 01 + a(| — 1 Za 1 ) 

and Pa is the probability that one or another of the following inequalities holds 


( 22 ) 


aJo — < (0 + A) < Xl -h i when (xg — Xl) > 

■ Xl - i + ^ ^ — |<(0 + A) <Xg + ^ - — I 



when {xg — Xl) < 



Now (21) can be written 

(23) Pi - P{(0 + A) - a(i - 1 02 1 ) < Zi < (0 + A) + «(i - 1 02 1 )}. 
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Referring to Fig. 1 we see that we have to evaluate the chance of the sample 
falling into a lozenge-shaped area like the unshaded area in ABCD, but moved 
bodily along the diagonal AG to such a position as is indicated by the dotted 
lines. Difficulties are introduced by the discontinuities, but we can show that 
for A positive 



with similar expressions for A negative. The graph of Pi against A is shown 
in Fig. 3, a. for convenience being taken = 0.92. From it we can read off the 
probability of the confidence interval covering (0 + A), where 6 is the true value 
of the parameter, 

Similar calculations may be made for Pi . Without going into details, it is 
seen that 



Pi is plotted against A in Fig. 3. It is seen that, whatever value of A we take, 
the chance of (9 -f- A) being included in the confidence interval, is less for the 
second method of estimation than it is for the method based on the distribution 
of zi 1 zi This circumstance would, I think, contradict the view that the latter 
method was deriving the utmost from the sample. Whether the method is 
still a good one, though not necessarily the best, is not a question at issue in the 
present paper. The curves in Fig. 3 are very close together, and we are led to 
expect this by the fact that (12) is the weighted mean of the powers within the 
separate configurations, the weights being the probabilities p(C) of the con- 
figurations. I am only concerned to show that certain methods, for which 

® It will be noted that, when inverted, the curves of Fig, (iii) represent the power func- 
tions of tests for which the regions of rejection are those in figures (i) and (ii) respectively, 
the test heing whether the parameter has the specified value 0, and different alternative 
hypotheses being represented by -f- A). 



ON CONFIDENCE LIMITS AND SUFFICIENCY 


67 


properties analogous to those of sufficiency have been claimed, do not satisfy 
conditions which I think they should, if these claims are to be upheld. 

3. Fiducial Distributions. In the first section of this paper I discussed certain 
points of difference between the approaches to the probleip of interval estimation 
made by R. A. Fisher on the one hand and J. Neyman and E. S. Pearson on the 
other The differences are not, perhaps, of the same magnitude as those between 
all these writers and the protagonists of inverse probability, and the results 
reached are so often the same that the reader may be excused for being some- 
what impatient with what appear to be rather fine distinctions. However, 
as was seen in the last section, the approaches do not always yield exactly the 



same final results, and therefore I think it may be profitable to discuss them 
still further. 

Closely connected with Fisher’s desire to restrict the use of the fiducial method 
to situations where statistics , exist which possess some property of sufficiency, 
is his introduction of the concept of a fiducial distribution for the unknown 
parameter. One can talk about the fiducial distribution for a parameter only 
if it is a unique distribution. Neyman, however, never makes use of fiducial 
distributions, and would, I think, claim that any valid results reached with the 
concept can equally well be reached without it. Where the results are the same 
there is room for two opinions on this matter. Some writers find it convenient 
to think in terms of fiducial distributions, and others prefer always to carry 
forward their reasoning as far as possible in terms of direct probability state- 
ments about the observational values, before transposing them to obtain con- 
fidence or fiducial limits for the parameters. 
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Greater objection can be made to the use of simultaneous fiducial distributions 
of several parameters. For instance, in the case of the normal distribution 
with parameters m and v, a simultaneous fiducial distribution has been defined 
in the following way.'" Starting with the fact that the joint distribution of 

^ (T 

is 

(if - ^ 

2'T{4)r(l^) 


X and s are treated formally as fixed, and 4>i and 04 are transformed to y. and tr, 
treated formally as variables. This gives 


(26) df ■ 


y/n - 


n (x— m) ^ 

Zffi 




2 - 
i - e 
cr 


(n-l)s2 

2<rJ 


(n 




dyda 


This distribution would be useful if it were legitimate to integrate it out to obtain 
a fiducial distribution for any function g{y, a) say, of y and <r. However, as for 
instance Bartlett has pointed out, this is not necessarily permissible. It seems 
to me therefore, that distributions defined as in (26) should be dispensed with 
entirely, for their very form encourages the belief that they can be integrated 
out at will. That this belief is still held is illustrated by a recent paper by 
Miss D. M. Starkey^'’ concerned with the difference between the means of normal 
populations where the standard deviations are not assumed equal. This is the 
original problem to which Fisher^* applied a method equivalent to integrating 
out the joint fiducial distribution of the two population means. Bartlett^* 
raised an objection to this method of treatment, and I have also discussed the 
matter further.^* Miss Starkey proceeds from the assumption that Fisher’s 
method is sound. 

The concept of the fiducial distribution has also been used in those problems 
of location and scaling, which have been treated by the procedure discussed 
above, of considering distributions in samples with the same configuration. 
Indeed it is one of the attractions of this procedure that we are led to distribu- 


R. A. Fisher, (1035). “The fiducial argument in statistical inference." Ann Eugen 
VI, p. 395. 

Daisy M. Starkey (1937). "A teat of the significance of the difference between means 
of samples from two normal, populations without assuming equal variances.” Ann. Math. 
Slat. Vol IX. No. 3, pp. 201-213. 

R. A. Fisher (1936). loc. at, 

“MS Bartlett (1936). “The information available m small samples.” Proc. Camb. 
Phil. Soc, 32, pp. 580-666. 

“ B, L, Welch (1937). “The significance of the difference between two means when the 
population variances are unequal.” Biometriha, XXIX, p. 358. 
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tions with, so to speak, one degree of freedom, so that the fiducial method may be 
safely applied. However, although probability statements based on such a 
fiducial method are here quite valid, I do not think that such statements can 
( . ] p,^Tn a unique validity. As I have shown in the previous section, there is no 
necessity to confine oneself to sampling within a configuration in order to obtain 
interval estimates for parameters, and we may fare better by not so confining 
ourselves, even if we have to dispense with the fiducial distribution. 

4. Summary. Certain points which arise in the problem of estimating an 
interval in wMch a population parameter should lie have been discussed. In 
the second section it has been shown that in estimating location parameters 
it is not sufficient to consider the distribution of estimates in samples of the 
same configuration, meaning by sufficient that the sample is thereby utilized 
in the best possible way. 

Univehsity Collbqe, 

London 



THE REGRESSION SYSTEMS OF TWO SUMS HAVING RANDOM 
ELEMENTS IN COMMON 

By J. F. Kenney 


1. Introduction. The purpose of this note is to illustrate the power and 
elegance of the technique of characteristic functions^ in solving a problem which 
has been discussed in the literature by Fischer^ and others. 

Let xi,X 2 , • • • ,Xnhen variables independent of each other in the statistical 
sense, all subject to the same distribution function /, so that the function 
representing their joint distribution is 

(1) /(a:i)/(%) 


Under these conditions a set of values xi,X 2 , • • • ,Xn will be said to constitute 
a sample of n from a population with distribution function /(a;) and the function 
(1) will be said to represent the distribution of samples. It will be understood 
that /(a:) is defined and is non-negative for all real values of x and 





dx — 


1 . 


If the actual occurrence of the variable is limited to a finite range, f{x) is defined 
as identically zero outside that range. 

The mathematical expectation of an arbitrary function denoted by 
application of the operator E, is 


( 2 ) 


Em] = 



dx. 


This integral will be convergent whenever \j/{x) is absolutely integrable and 
bounded. In particular, if \j/{x) = a: we have the mean 

a = f xf{x) dx 


and it will be assumed that a exists. 

Suppose a sample of n is taken from the population represented by f{x) and 
the sum 


(^) 2/ ~ "I" 3^2 "f" ■ • • + % + ^h+l + • • • "b 


^ The writer takes pleasure in acknowledging his indebtedness to Professor A. T. Craig 
for suggesting this method. 

“ "On correlation surfaces of sums with a certain number of random elements in com- 
mon,” these Annals, vol. 4, no. 2, pp. 103-126. 
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is formed. From this sample k < n values are chosen at random, and a sample 
^ _ it (ot < n) additional values, xj , is taken from f{x). The sum 

(4) 2 = Xi + Ka + • • ■ + ajfc + x'k+i + ■■■ + x^ 

is then formed. The problem is to determine the regression systems of 2 on 
and 2 / on z in the population resulting from repeated samples. 

Before proceeding with the solution a brief discussion of characteristic func- 
tions will be given. 


2. Characteristic functions. When 4'ix) = e'**, where < is a real variable and 
i = \/^, (2) is called the characteristic function of x. Thus if we let <p{t) = 
we have 


<pit) = 


“fix) dx. 


From the conditions imposed on f(x) it follows that the integral defining <p(fy 
is convergent and j ^(i) 1 < 1. If the fcth derivative of <p(t) with respect to t 
exists we have 




|t-o 




where 


Vk = 



dx. 


Thus the characteristic function of a: has the property that its ftth derivative 
at the origin (divided by t*') gives the fcth moment of the distribution of x about 
the origin of x. 

The notion of characteristic function extends readily to a distribution of 
several variablfes. In particular, let F{y, z) be the joint distribution function 
of variables y and z subject to the condition 

I F(y, z)dydz = 1. 



Then the characteristic function of F(y, z) is 

(5) r r e^‘^>'-^'^‘’F(y,z)dydz 

J^«C J—90 

where y and z are defined in (3) and (4). 


3. Solution of the problem. The distribution function associated with the 
population of samples is of the form given by (1). Consequently, the char- 
acteristic function of F{y, z) can be written in the form 

/ r k n m 

... ]Je'^*'-^‘^^^>f(x,)dx, n e'‘'V(=o,)dx, II /(x^) dx', 

J j”! j—fc+1 j=k+l 
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the limits of integration being taken over all admissible values of the variables. 
The above expression reduces to 

( 6 ) = [^(<1 + *. 

By the Fourier transform we have from (5), 

ny,z) = (l/27r)“ r r e^'‘^''-'‘‘‘<p(ti,t2)dhdh. 

J—<A J—tKi 

Since a distribution is completely determined by its characteristic function, 
F{y, z) can be exhibited if /(a:) is known. However, the solution of the problem 
does not depend upon exhibiting F{y, z). 

Let g{y) and ?i(z) be the marginal distributions of y and z, respectively. 
Then the mean value of z for a fixed y is 

(71 z = f dz 

(71 z, j dz, 

and the mean value of y for a fixed z is 

( 8 ) 

where here and subsequently the integration is taken over all admissible values 
of the variables. 

Let us now take the partial derivative of (p{ti , h), as given in (6), with respect 
to fj and evaluate the result at = 0. We obtain 

(9) f f 

If we denote the left member of (9) by G(ii) and utilize (7) in the right member, 

(9) becomes 

G(ti) = J g(y)zyie^‘''dy. 

Application of the Fourier transform yields 

(10) ig(y)s, = 1 J e-'‘'’'G(ti)dU. 

Now from (6), 

G(ti) — + ¥j(h)”m(wi — fc). 

Therefore (10) may be written as follows, 

( 11 ) ig(y)zy = ^J dh + 
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To evaluate these integrals, consider 

( 12 ) ¥’(<i)" = J e“'‘'sf(y) dy. 

Differentiating (12) with respect to k we have 

(13) n(f>{k)’'~^<p'(ii) = J iye'‘”'g(7j} dy. 

Again using the Fourier transform, we obtain 

^ygiv) = ^ J dh 

from (13) and 

o(y)=^ f e-'‘^^^rdk 

from (12). Therefore (11) reduces to 

k 

miy)h = - miy) + Mm - k)g{y) 

71 

and we have at once the simple result 

(14) Zy = ky/n + a{m — k). 

In an analogous manner, it may be shown that 

(15) Vz = kz/m + a{n — k). 

Writing (14) and (16) in the forms 

fz„ — ow = ci(y — on) 

— an = Ci(z — am) 

where ci = k/n and C 2 — kfm are the regression coeflScients it follows from the 
linearity of the regressions that the correlation coefficient is 

P = ~ klMmr). 

If m = n, we have a well known result which is sometimes stated as follows : 
If y and z are affected by n equally likely causes of which k are common to 
both, then the correlation coefficient between y and z is equal to k/n. 

Northwestern University. 



A NOTE ON CONFIDENCE INTERVALS AND INVERSE PROBABILITY 

By Albert Wertheimer 


The object of this note is to discuss a certain property of confidence intervals 
from the point of view of inverse probability. We shall not go into detailed 
applications, but merely into fundamental ideas, so we shall work with distribu- 
tion functions that are continuous and satisfy conditions which are sufficient 
to insure the validity of the mathematical steps used, 

A clear and concise statement of the subject is given in a paper by Neyman,* 
and we shall use it as the basis for our discussion. His presentation can be 
summarized as follows; Let a: be a sample statistic having a distribution function 


p{x, e) 


< X <X2 

e <8i 


where 0 is a parameter of the population. Now define two monotonic functions 


x=f{8); x=g{d) 


such that/(fl) < g{d), and 
foie) 

(1) / p{x, d) dx = 1 ~ e, for all 6. 

J/w 

Let the prior distribution function of 8 be 


Xi < X < Xi 
ei< 6 <6i 


9i< 8 <9i. 

It then follows directly that the probability for any pair of values {x, 8) lying 
within the region enclosed by the curves is given by 

fh foil) 

(2) / ^{8) d9 / p{x, d)dx = 1 ~ 6. 

>1 J/V> 

regardless of the prior function f (fl). His conclusion then is this : Stating that 

( 3 ) g~\x)<9<r\x) 

every time the observation gives us a value of x equal to that given in (3) we 
may in any one instance be wrong; this will happen only if the pair {x, 9) for this 
observation lies outside the region enclosed by the curves; but from (2) the 
probability for this to happen is e. This statement is equivalent to saying that 


'■Journal of the Royal Sialistical Society, Vol, 97, part IV, 1934; pp. 589-93, 
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If for every observed x we write the inequality (3), then for a large number of 
samples, the fraction 1 — e of the inequalities will be found correct. 

We note here that this is true only if in the inequality (3) x is presumed to 
range over its entire interval of definition. But if for an observation x ~ x', 
we mean to consider the corresponding inequality 

( 4 ) g-\x') < e < r\x') 

as one member of the class of inequalities that could be written just for those 
samples that had x = x', then we can not assert that the inequality (4) has a 
probability of 1 — € of being correct. In fact, any probability statement dealing 
with this class must involve the prior distribution function ^(0); and if it is not 
given, then we do not know in what percent of cases the restricted inequality (4) 
will be found correct. 

Let us nevertheless approach the problem from the viewpoint of inverse 
probability. Having observed x = x', the posterior probability of inequality 

(4) being correct is 

I ^(B)p(x', e) de 

(5) vix') - 


f iie)p{x', e) de 
Jh 


the numerator being the probability for the simultaneous occurrence of 


X = a:'; 


<e<r\x% 


and the denominator the probability* that x = x', e lying anywhere between 
8i and 6i . 

As long as is unknown 7?(a;') cannot be evaluated; however its average 
value ii{x) with respect to x can be evaluated. By definition of an average, 


( 6 ) 

From (5) we have 


r*s 

fl{x) = I rjix) dx I \l/{e)p{x, 6) dd 
Jxi Jfl 


(7) 


rt-Hx) rh 

/ Hd)p(x, 9)dB = v(x) / <p{d)p{x, 9) de 


Integrating both sides of (7) over the entire range of x we get 
rxi rf-i (*) r *2 

/ dx I \(/{e)p{x, 9)d9 = I ri{x) dx I il'(9)p(x, 6) dd 
hi hi jtl 


* When we say probability that x = we mean the probability that x will lie in the 
internal x ± ^dx to within terms of order dx. 
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Interchanging the order of integration, as is permissible under the assumptions, 

we get 

fi} rm 

f}(x) = \ 'p(x,S)dx 

A Jm 

But since 

fiif) 

p{x,d)dx =‘l-e, for alio 

Jm 

we finally get 


ti(x} = 1 - c 


Thus when approached from thd standpoint of inverse probability we see that 
the average value of the posterior probability of the inequality (4) is precisely 
the quantity 1 - « regardless of the prior distribution function f (0). 

In conclusion it is a pleasure to thank Dr. Deming for the criticisms and 
suggestions which he has made in connection with this note. 

Navy Dbpabtmbnt, 

Washington. 



NOTE ON A MATCHING PROBLEM 

By Solomon Ktjllback 

1. Introduction. There ia to be found in the literature [1] a number of dis- 
cussions of the matching problem i.e., the problem of deriving the distribution 
of the number of correct matchings when two sequences of elements are placed 
in correspondence. However, the formulation of the matching problem dis- 
cussed and illustrated herein is somewhat different from those problems already 
discussed in the literature [1], and may be of interest. A rather general state- 
ment of the problem follows. 

2. The Problem. Consider urns f7, , f = 1, 2, ■ • • , n each of which contains 

some or all of the r different elements Ei , Ei , - • ■ , . The relative propor- 

tions of the r elements in the i-th urn are Pti , Pij , • • • , Pir (f = 1, 2, • . • , n) 
such that 

(1) p.i + p .2 + • • • + Pir = 1 i = 1,2, . ,n 

(2) p!i + p ?2 + • • • + = p, i = 1,2, ,n 

(Some p„ f =* 1, 2, ■ • • , n, i = 1, 2, • • • , r may be zero). 

Assuming each urn to be an infinite source, consider two sequences made by 
drawing, at random, a single element from each urn in turn. If the two se- 
quences are placed in correspondence there will be a number of correct match- 
ings. What is the distribution of the number of correct matchings if the fore- 
going process be indefinitely repeated? 

3. Solution of the Problem. The probability that the elements in the fc-th 
position of the two sequences match may be derived by the following simple 
considerations Since all the drawings are independent, the probability that 
both elements in the fc-th position are Em is pL . Accordingly, the probability 
that both elements are the same, irrespective of their particular identity is 
Pu + Pm + • ■ • + pL = pk . 

The theory for the number of correct matchings in this casq thus corresponds 
to that for the Poisson series, which is well known [2], Tor the special case in 
which Pi, = p, fc = 1, 2, • • • , n the distribution of the number of correct match- 
ings is in accordance with the binomial (</ + p)" where g = 1 — p. 

4. Numerical Illustration and Verification. The following illustration corre- 
sponds to the special case in which the urns are taken to be identical with equal 
proportions of each of the r elements. 
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Random sequences of 300 digits each were matched and the number of correct 
matchings recorded. The result of 457 such observations is given in Table 1. 

TABLE 1 


Observed distribution of number of correct matchings per sequences of SOO random 

digits each 


Number of correct 
matchings 

Observed frequency 

Number of correct 
matchings 

Observed frequency 

18 

1 

32 

35 

19 

2 

33 

26 

20 

3 

34 

15 

21 

5 

35 

20 

22 

9 

36 

20 

23 

22 

37 

17 

24 

18 

38 

6 

25 

21 

39 

10 

26 

41 

40 

7 

27 

28 

41 

1 

28 

30 

42 

3 

29 

31 

43 

1 

30 

42 

44 

0 

31 

42 

45 

2 



Total 

467 


Average number of correct matchings Standard deviation 
29.9934 4.8484 


TABLE 2 

Values of P:, = {S00!/x!{300 - x).0(0.f)“’(0.9)®““-* 


X 

Px 

X 

Px 

X 

Px 

X 

Px 

14 

0.00033 

23 

0.03240 

32 

0.06920 

41 

0.00875 

15 

.00070 

24 

.04156 

33 

.06245 

42 

.00599 

16 

.00139 

25 

,05099 

34 

.05499 

43 

,00400 

17 

.00257 

26 

.05992 

35 

.04601 

44 

.00259 

18 

.00449 

27 

06756 

36 

.03763 

45 

.00164 

19 

.00741 

28 

07319 

37 

.02984 

46 

.00101 

20 

.01156 

29 

.07628 

38 

.02294 

47 

.00061 

21 

.01713 

30 

.07656 

39 

.01713 

48 

.00036 

22 

.02413 

31 

.07409 

40 

.01242 

49 

.00020 


In accordance with paragraph 3, the distribution in Table 1 should correspond 
to the binomial distribution with n = 300 and p = 10(1/10^) = 1/10. For the 
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TABLE 3 


Comparison oj observed distribution with the theoretical distribution 
4-57 {0.9 + O.i)’"® 



Frequency 

Number of correct 
matchings 

Observed 

Theoretical 


Fo 

f 

F - 467/ 

14-16 

0 

0.00242 

1.1 

17-19 

3 

.01447 

6.6 

20-22 

17 

05282 

24.1 

23-25 

61 

.12495 

57.1 

26-28 

99 


91 7 

29-31 

115 

.22693 


32-34 

75 

.18614 

85.1 

36-37 

57 

.11348 

51.9 

38-40 

23 

.05249 


41-43 

5 

.01874 

8 6 

44-46 

2 

.00524 

2.3 

47-49 

0 

.00117 

.5 


467 


456.7 


TABLE 4 


Fo 

F 

(Fo - F)VF 


01 
3J 
17 
61 
99 
115 
75 
57 
23 

2 
0. 


l.ll 

6.6J 

24.1 

57.1 
91.7 

103.7 

85.1 
51.9 
24.0 

8.6 

2.3 

A 

> 

2.87 

2.09 

.27 

.58 

1.23 

1.20 

.50 

.04 

1.70 

Xo = 10.48 

n = 8 

Pix^ > xl) = .236 



10.48 



binomial distribution we have m = np = ZQ, o = -s/npq = — 6.1962. 

To compare the observed distribution with the expected distribution we calcu- 
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lated the values of Px ~ (3001/a:!(300 — x)!)(0.1)''(0.9)®“‘’ * for values of x 
from 14 to 49 inclusive which are given in Table 2. 

To compare the observed and the theoretical distributions, and test the 
"Goodness of Fit,” the distributions were grouped in classes of three. The 
results are shown in Tables 3 and 4. 

5. Conclusion. The agreement between the observed distribution and the 
theoretical distribution derived on the basis of the argument in paragraph 3 
is quite satisfactory. 

We have shown herein, that if two sequences be matched under certain con- 
ditions, the distribution of the number of correct matchings will, in general, be 
that of a Poisson series and in special cases the binomial distribution. The 
theory was illustrated by an experiment which yielded results in satisfactory 
agreement with the theory. 
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report of the annual meeting of the institute 

The fourth annual meeting of the Institute of Mathematical Statistics was 
held in Detroit, Michigan, on December 27-29, 1938, in conjunction with the 
meetings of the American Statistical Association and the Econometric Society. 
The program for the meetings was arranged by Professors S. S. Wilks and 
B. H. Camp. 

On Tuesday morning, December 27, the Institute held a ses.sion devoted to 
contributed papers with Professor B. H. Camp, president of the Institute in 
the chair, At that time the following papers were presented ■ 

1. GeneraMzalions of the Laplace-Liapounoff Theorem. 

W G, Madow, Millbank Management Corporation, New York. 

2. The standard errors of the geometric and harmonic means. 

Nilan Norris, Huntei College. 

3. Note on an integral equation in population analysis. 

Alfred J. Lotka, Metiopolitan Life Insurance Company, New York 

4. Optimum fiducial regions for simultaneous estimation of several population parameters 
from large samples. 

S, S. Wilks, Princeton Univetsity, 

5. A mathematical contribution to immigration assessment. 

Churchill Eisenliart, University of Wisconsin. 

6. Contributions to the theory of statistical estimation. 

A. Wald, Columbia University. 

7. On the hypotheses underlying the applications of statistical methods to routine Idbora- 
tory analyses. 

J. Neyman, University of California. 

8. Commodity transformations and matrices. 

Harold Hotelling, Columbia Univeisity. 

9 Remaiks on two methods of sample inspection 
E, G. Olds, Carnegie Institute of Technology. 

Abstiacta of these papers are given at the close of this report. 

Immediately following the session just described, the Institute convened in 
business session. At that time President Camp announced that the newly 
elected officers for the year 1939 are: President, P. R. Rider, Washington 
University; Vice-Presidents, C. C. Craig, University of Michigan, and S. S. 
Wilks, Princeton University; Secretary-Trea.surer, A. T. Craig, University 
of Iowa 

The annual luncheon of the Institute was held at one o’clock on the same 
day. At the luncheon, Dr. Walter A. Shewhart, of the Bell Telephone Labora- 
tories addressed the Institute on “The Future of Statistics in Mass Produc- 
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tion.’* A summary of this address is included among the abstracts at the 
close of this report. 

On Wednesday morning, December 28, the Institute and the Statistical Asso- 
ciation held a joint session devoted to the teaching of Business Statistics, 
Professor T. H. Brown presided. The following papers con.stituted the program ; 

1. The teaching of undergraduate aludenta. 

L, S, Kellogg, Ohio State University. 

2. The teaching of graduate atudenta. 

0. W. Blackett, University of Michigan. 

3. A beadsampling machine for use in the class room. 

Dickson H. Leavens, Cowles Commission for Research in Economics. 

Discussion: Harry P. Hartkemeier, University of Missouri. 

Richard L. Koselka, University of Minnesota. 

On the afternoon of the same day, the Biometric Section of the Statistical 
Association and the Institute presented the following program on Statistical 
Methods in Genetics Problems with Professor Lowell J. Reed as chairman: 

1. Tests of simple Mendelian inheritance in randomly collected data of one and two 
generations. 

Laurence H. Snyder and Charles W. Cotterman, Ohio State tlniversity. 

2. Statistical st-udiee of the familial aspects of cancer in humans. 

Herbert L. Lombard, Masachusetts State Department of Public Health. 

3. Application of the method of likelihood ratios to the testing of hypotheses of simple 
Mendelian inheritance. 

S. S. Wilks, Princeton University. 

4. The application of statistical techniques to egg production data for the formulation of a 
breeding program. 

W. C. Thompson, New Jersey Agricultural Experiment Station. 

The Program Committees of the Institute and the Statistical Association 
arranged a joint session on Representative Sampling for Thursday afternoon, 
December 29. At that time the following papers were presented, with Professor 
Harold Hotelling presiding: 

1. On the mathematics of the representative method. 

Allen T. Craig, University of Iowa. 

2. Application of the theory of sampling to large scale surveys and censuses, 

Frederick F. Stephan, American Statistical Association. 

3. Further remarks on the mathematical aspects of representative sampling. ' 

J. Neyman, University of California. 

Discussion: Samuel A. Stouffer, University of Chicago. 

Churchill Eisenhart, University of Wisconsin. 

P. J. Rulon, Harvard University. 

The final session of the meetings was held on Thursday evening. This was 
a ]oint session with the Econometric Society and was devoted to Mathematical 
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Statistics in Economics. Professor Irving Fisher presided and the following 
papers were given: 

1, On the hypothesis of bnca) ity of regression in economic research. 

J. Neyman, University of California. 

2. The selection of variates for use in prediction. 

Harold Hotelling, Columbia University. 

3 Decomposition of time senes on the basis of non-correlation principle. 

Wassily Leontief, Harvard pniversity. 

Discussion: William G. Madow, Millbank Management Corpoiation, Kew York 
Gerhard Tintner, Iowa State College. 


A. T. Ceaig, Secretary. 



ABSTRACTS OF PAPERS 

(Presented on. December 27, 1938, at the Detroit meeting of the Institute) 

Generalizations of the Laplace-Liapounoff Theorem. W. G. Maoow, Milbank 
Management Corporation, New York. 

The Laplace-Liapounoff Theorem states conditions under which a linear function of 
chance variables has a normal limiting distribution. 

In dealing with limiting distributions arising in the analysis of variance, regression 
analysis, etc., there occurred problems wdiioh required for their solution the derivation of 
the joint limiting distribution of several linear functions of chance variables and the joint 
limiting distribution of functions which were linear in one set of chance variables for fixed 
values of other sets of chance variables. 

These problems were solved by a matrix formulation of the Laplace-Liapounoff Theorem 
and by the introduction of a function whose convergence to zero in probability provided a 
sufficient condition for the existence of normal limiting distributions 

Various generalizations with a view' towards applications in multi-variate statistical 
analyses are discussed. The theorems provide a rigorous and complete basis for the deriva- 
tion of limiting distributions of quadratic and bilinear foi ms 

The Standard Errors of the Geometric and Harmonic Means. Nilan Noeris, 
Hunter College. 

Although certain properties of the geometric and harmonic means have been investigated 
extensively, there seems to have been no derivation of expressions for their variances in 
cases where they are used as estimates of parameters of parent populations. 

Application of the modern theory of estimation makes it possible to develop simple 
and useful formulae for the standard errors of these two averages for each of the respective 
general classes of cases in which they are most suitable. 

As in other instances in which standard errors are used m tests of significance, fiducial 
01 confidence limits may be employed to overcome certain limitations of the outmoded 
practice of relying solely on multiples of either probable or standaid errors to determine 
w'hether or not a result exists merely because of sampling fluctuations 

Note on an Integral Equation in Population Analysis. Alfred J. Lotka, Metro- 
politan Life Insurance Company, New York. 

In a population in which immigration and emigration are negligible, the number N{1) 
of the population at time t is connected with the annual births B(0 and the probability p(o) 
of suiviving from birth to age a, by tiie obvious lelation 

^00 

(l) Nil) ° J B(( — a)p(tt) da 

If B(i) and p(o) are given, N(t) follow'S at once by direct integration. The inverse 
problem, given N(l), to find B(l), requires separate treatment The case that N(t) is 
given or can be expressed as a sum of exponential funetions has been discussed by the 
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author on a former occasion. In the present communication it is shown how the function 
B(i) can be expressed as a series proceeding in ascending derivatives of N(t). 

A second solution is also ofEered in which 


Bit) 

N(t) 


Ht), 


the birth rate per head is obtained as a series, the first and dominating term of which is 

1 


( 2 ) 


6(t) = 


^00 

e ‘ 
Jo 


p(o) da 


where rt is the rate of natural increase at time t. This development is of interest because 
it corresponds to the expression for 6 in a population with constant birth rate, death rate, 
and rate of natural increase; that is 


(3) 


b 


1 


/. 


e~'‘‘p(a) da 


so that the new expression represents b(t) as the corresponding value of b in a Malthusian 
population, plus a series of correcting terms. 


Optimum Fiducial Regions for Simultaneous Estimation of Several Population 
Parameters from Large Samples. S. S. Wilks, Princeton University. 


If a population has a distribution law /(») 9) where x is the variate and 9 is a parameter, 
it is known (Annals of Mathematical Statistics, Vol. IX (1938) pp. 166-175) that under 
rather general conditions, confidence intervals, for a given confidence coefficient a, which 
are shortest on the average, can be obtained from large samples of n items by solving the 
equations 



for 9, where is the 
of the likelihood, i.e. 


1 1 i 

normal deviate given by — 7= f e dt^ 
V 2w J-s, 

n 

L = 23 /(*< , 6), where E denotes mean 


a. L is the logarithm 
value with respect to 


the probability law /(*, 6). 

The present paper is an extension of the foregoing results to the case of several parame- 
ters, It is shown under fairly general conditions that if the distribution law of a; if a 
function f(x, 9, , • • • , 9a) depending on h parameters, then for a confidence coefficient a 
the fiducial region of the 9’s which is smallest in size on the average is given by the region 
in the space of the 9's for which 


1 

n 



< xl 


( 2 ) 
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where xl is such that P(x’ < X*) ■= «, where the probability is calculated from a x’ distri- 
bution with h degrees of freedom. The matrix || ojj II is the inverse of the matrix whose 
general element is 

„[ d\ogf d log / ~| 

J 

Similar results hold when /is a function of several random variables as well as the parame- 
ters ,Bh. 


A Mathematical Contribution to Immigration Assessment, Churchill Eisen- 
HART, University of Wisconsin.. 

A certain problem is assessing the size of an immigration can be stated mathematically 
as follows: A sample of size N is drawn at random from a population in which the proba- 
bility of A is p. Let “not-A” be denoted by B. Then the sample will contain a frequency, 
say X, of A and N — x of B,x being a random variable. This sample is now mixed together 
with a very much larger sample in which the elements are all B’a, and the B'b belonging 
to the original sample lost sight of. The problem is to estimate N from the observed 
frequency of A, namely *, in the composite sample, p being assumed known. The maxi- 
mum likelihood estimate of N is x/p. For large values of x, and a fortiori of N, confidence 
intervals for N take the form Ni < N < Nt where 

Ni = IV' g’P + iqjx - U - giV 
4pq 

Ni = + ^) + g^}* 

ipq 

g = 1 - p, 

and the confidence coefficient is .96 if t is set equal to 1.96, and is .99 if i is set equal to 2.68, 
For small values of x the solution is more difficult but charts are being prepared from 
which the confidence intervals can be read off. 


A Cofitribution to tbe Theory of Statistical Estimation. A. Wald, Columbia 
University, 

Let us denote by /(», 9) a probability density function, where 9 is a parameter. Denote 
by SJ the set of all possible values of 9, The assumption that 9 belongs to a subset « of !) 
is called a hypothesis. Let us consider a system S of subsets of ft. Denote the hypothesis 
corresponding to an element « of iS by and the system of all hypotheses corresponding 
to the elements of S hy Ha ■ Denote hy B a, sample point in the n-dimensional sample 
space drawn from a population with the probability density function /(», 9), where the 
value of 9 is unknown. We have to decide by means of the sample point B which hy- 
pothesis of the system Ha should be accepted. That is to say, for each hypothesis Ha 
we have to choose a region of acceptance in the sample apace. The hypothesis Ha 
will be accepted if and only if the sample point E falls in the region Ma • Denote by Ma 
the system of all regions Ma • The statistical problem to be solved is the question of 
how the syaiem o/ regions Ma should be chosen? 

In order to answer this question, a non-negative weight function 'U)(9, w) is introduced, 
which is defined for all values 9 and for all elements a of S, The weight w(9, w) expresses 
the loss caused by accepting Ha if 9 is true. The probability of accepting Ha multiplied 
by the weight w(9, a) is called the risk of accepting Ha if 9 is true Denote this risk by 
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r{h , Ms) (the risk depends obviously also on the system of regions Ms). The total 
risk of accepting a false hypothesis if S is true, is given by 

r(.e, Ms) = Ms) 

w 


where the summation is to be taken over all elements u of S which do not contain e. 

Denote by riJMs) the maximum of r(6, Ms) with respect to 6. The system Ms of regions 
for which r{Ms) becomes a minimum is called the "best” system of regions relative to the 
weight function a) . Some properties of the best system of regions have been studied 
and the problem of its calculation has been treated. 

On the Hypotheses underlying the Applications of Statistical Methods to 
Routine Laboratory Analyses, J. Nbyman, University of California. 

The problem considered is that of estimating the proportion, p, of certain designated 
elements of the population sampled, the estimate to be based on a random sample of n, 
drawn by some mechanical device, such as is sometimes used in industry and in laboratory 
work. Examples: (1) to estimate the proportion of defective manufactured products in 
mass production; (2) to estimate the proportion of seeds which are able to germinate in 
given conditions. One would expect that the sample proportion, say g, will be distri- 
buted in repeated samples according to the Binomial Law and that, consequently, in order 
to obtain the confidence limits for p, one should use the Clopper-Pearson graphs. How- 
ever, the evidence obtained from special analyses on seed germination, made in the Seed 
Testing Station of Warsaw, Poland, shows that this assumption may not be true. The 
sampling there was carried out by means of a machine and involved a certain amount of 
mixing. As a result g was more stable than it was expected. It did not follow the Bi- 
nomial Law at all, but a Normal one about p, with a standard deviation, v, which could 
be well estimated from the sum of squares of deviations from respective means. For a 
considerable period of time (18 months) <r retained a constant value (a characteristic of 
the action of the sampling machine) which was rather smaller than (n''g(l — q))K 

Consequently, to have a preassigned frequency of correct statements concerning p, 
it was necessary to calculate the confidence intervals according to the formulae of the 
Normal Theory 

q-i<r<p<g + i<r 

with an appropriate value of i. Probably similar situations are rather common. 


Remarks oa Two Methods of Sampling Inspection. E. G. Olds, Carnegie 
Institute of Technology. 


When the instructions for inspecting lots of size m specify that samples of size n be 
taken and the lot be passed without detailed inspection if no defectives are found, then, 
on the average, the maximum number of defectives are passed when the number of defects 

, , . m -f- 1 m + 1 , 

per lot IB — — - or — — - 1. 
n + 1 n + 1 

If the quality of a lot is to be checked by drawing pieces until a fixed number of detective 
pieces are found, it is important to know that the expected number m necessary to obtain i 

n, 


defects is i — ^ j ’ , where there are p defects in the lot, 


of 


1 


P + 1 


If 


P + 1' 


it is convenient to observe that the variance of 


i(m + 1) 
ni 


is used as an estimate 
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Commodity Tremsfomations and Matrices. Harold Hotelling, Columbia 
University. 

If we regard the prices and quantities of n commodities as vectors we may apply the 
theory of linear transformations in various ways having economic and statistical signifi- 
cance. An example is the mixing of grains to produce results conforming to new specifica- 
tions, as in international trade. Another kind of example arises in problems of multi- 
variate statistical analysis such as those treated in my paper on "Relations between Two 
Bets of Variates” {Biometriha, 1936), concerned with properties invariant under internal 
linear transformations of the variates of each set. Prices transform contragrediently to 
quantities in all cases. Hence, if the prices and quantities of the same set of commodities 
are the two sets of variates, the allowable transformations are restricted. Consequently 
there are invariants in this case additional to those discussed in the paper mentioned 
Another problem is the reduction of sets of linear demand functions to normal form and 
determination of invariants when transformations of prices must be oontragredient to 
those of commodities. The question whether the demand functions are symmetrical is 
here of paramount importance, since symmetry is preserved by such transformations, 
and since there are known theoretical reasons to expect symmetry. For a non-singular 
set of linear symmetrical demand or supply functions there are no invariants under arbi- 
trary sets of oontragredient transformations; but for pairs of such sets of demand and 
supply functions there are invariants, namely the elementary divisors of the pair of ma- 
trices of coefficients. A set of demand or supply functions alone has invariants if its 
matrix B is not symmetrical. If B' denote the transverse or conjugate of B, the elementary 
divisors of B + are such invariants. 

The Future of Statistics in Mass Production.! Walter A. Shewhart, Bell 
Telephone Laboratories, New York. 

Much has been written about the application of statistical theory and technique in 
studying, discovering, and measuring the effects of an existing system of unknown or 
chance causes Much remains to be written about the application of statistical theory 
and technique in finding out how to tinker with and modify an existing chance cause 
system until it behaves as we would have it do. In research, we use statistical theory in 
helping to predict the future effects of some existing cause system. The statistician knows 
that his predictions will be valid if certain assumptions about the cause system are justi- 
fied. Perhaps the most important assumption of this type is that the particular effects 
of a chance cause system under fetudy are random. In mass production, however, the 
statistician has learned by experience that chance cause systems producing random effects 
don’t just happen even under what we customarily consider to be the best regulated 
laboratory conditions. If the industrial statistician chooses to ignore this fact and makes 
predictions as if he were dealing with random cause systems, he may expect many of his 
predictions to fall far short of the truth; what is more, he knows that this fact will be 
discovered and his work discredited because in a continuing mass production process 
predictions are sure to be checked. Hence the industrial statistician in mass production 
must start not where the research statistician leaves off but, as it were, before the research 
statistician begins: that is, he must start by developing techniques for determining when 
we are justified in assuming that the underlying cause system is random. We thus arrive 
at a good starting point from which to consider the future of statistics in mass production. 

Experience in the control of quality has provided a practical technique for detecting 

' Summary of an address delivered at a luncheon meeting of the Institute of Mathe- 
matical Statistics, Detroit, Michigan, December 27, 1938. 
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and eliminating assignable causes of variability in the production process until a state of 
statistical control is reached where predictions based upon the assumption of randomness 
are likely to prove valid. It has also been shown elsewhere that by elimination of assign- 
able causes of variability, we may make the most efficient use of raw materials, maximize 
the assurance of maintaining standard quality of manufactured goods, minimize the coat 
of inspection, and minimize the cost of rejections. Hence we may conclude that the use 
of statistics in mass production can be made to pay good dividends; such use can be made 
to have a bright future. On what then does that future depend? 

To answer this question, we must consider the following three fundamental steps in the 
process of mass production; 

I The specification of the quality of the thing wanted, 

II, The production of things designed to meet the specification 
III. The inspection of the things produced to see if they meet the specification. 

An outstanding characteristic of the first step, specification, is the necessity of setting up 
and living within what we term a tolerance range* for each specified quality characteristic. 

If a producer contracts to live within some specified range and in taking steps II and III 
fails to do so, he usually loses a lot of money. Hence he must know how to set tolerance 
limits that he can meet Moreover, if he is to be able to make the most efficient use of 
materials in many instances, he must close up as much as he feasibly can on the specified 
tolerance range. 

Obviously, however, one can not specify a practically attainable tolerance range out of 
thin air: one must be limited by what it is possible to do under commercial conditions of 
production in step II and this m turn is revealed hy inspection in step III. We must also 
take into account the fact previously noted that any manufacturing process to begin with 
is almost certain not to be in a state of statistical control. In fact, this state can only be 
approached through the application of certain statistical techniques that have been found 
useful in detecting the presence of assignable causes that can be found and removed. A 
point to be stressed is that the three steps— specification, production, and inspection— in 
mass production, cannot be taken independently; instead, they must be coordinated. It 
also may be shown that maximum effectiveness in the use of statistical theory can only 
be attained by coordinating the applications in each of the three steps. It is significant 
to note that in order to attain the most efficient use of materials and processes by mini- 
mizing the tolerance range and in order to minimize the cost of production, one must 
make effective use of the results obtained in the course of commercial production, par- 
ticularly those in the third step, inspection. In fact, the three steps might be thought of 
as constituting a scientific experiment in which the objective is the attainment of the 
most efficient use of available materials in the production of manufactured goods. 

Broadly speaking, the statistician of the future has before him the opportunity of 
helping to develop this fundamental type of experiment in many respects like the way he 
is successfully helping today in so many fields of research to design experimental procedures 
that make the most efficient use of human effort Certain differences, of course, exist. 
For example, as already noted, he must start by designing a statistical control technique 
foi randomizing, as it were, the cause system through the elimination of assignable causes. 
Then he can use modern statistical techniques of research in much the same way described 
in the literature with reasonable assurance that resulting predictions will be found valid 
because he has first randomized his cause system. He must, however, go farther than 
indicated in the current literature of statistical research in that he must provide opera- 
tionally verifiable meanings for statistical terms such as random variable, accuracy, pre- 


* The tolerance range is not to be confused with the fiducial range of modern statistics. 
The distinction between the two is set forth at some length in a forthcoming publication, 
Statistical Method from the Viewpoint of Quality Control, to be published shortly by the 
Graduate School of the United States Department of Agriculture 
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ciaion, true value, probability, degree of rational belief, and the like.* This is particu- 
larly necessary in the steps of apeojfication and inspection because the specification is 
often made the basis of a contractual agreement between producers and consumers. 

There is a sense in which the statistician's problem in helping to develop the mass 
production process so as to make the most effective use of information yielded by the 
process is much more complicated than the design of experiment usually considered in 
the literature of statistics. Whereas the Customary statistical theory of design of experi- 
ment in research is concerned with comparatively small-scale experiments carried out 
tmder controlled conditions of the laboratory by a few people, the corresponding develop, 
ment of the mass production process must be carried out under commerejal conditions on a 
largo scale involving large numbers of people. For example, the three steps in the mass 
production process are usually carried out either by different companies or by different 
departments of the same company. The steps may involve the oohrdinated effort of 
literally hundreds and even thousands of employees, including physicists, chemists, en- 
gineers, sales agents, purchasing agents, lawyers, and economists. Very few of these 
people have ever had any training in statistics or probability and yet many of them must 
be sold on the use of statistics if the statistician is to have an opportunity of making 
his full contribution to the social and economic effectiveness of the mass production 
process. This situation constitutes a problem not only for those now in industry but 
also for those responsible for training the industrial leaders of tomorrow so that they 
will have sufficient knowledge of statistics to help them recognize the potential contribu- 
tions of statistical theory and teohniq.ue. 

In conclusion, then, we may say that in the future the statistician in mass production 
must do more than simply study, discover, and measure the effects of existing chance 
cause systems: he must devise means for modifying these cause systems in the best way 
to satisfy human wants. The statistician in mass production must not be satisfied with 
simply measuring demand for goods; he must help change that demand by showing, among 
other things, how to close up the tolerance range and improve the quality of goods. He 
must not be content with measuring production costs; he must help decrease production 
costs through the use of the techniques of statistical control. 

The future contribution of statistics in mass production lies not so much in solving 
the problems usually put to the statisUoian today by those not statistically trained as 
in taking a hand in helping to coordinate the steps of specification, production, and inspec- 
tion considered as a scientific experiment for making the most efficient use of human 
effort in the production of goods to satisfy human wants The long range contribution 
of statistics depends perhaps not so much upon getting a lot of highly trained statisticians 
into industry in the immediate future as it does in creating a statistically minded new 
generation of those physicists, chemists, engineers, and others who will in any way have a 
hand in developing and directing the mass production process of tomorrow. 


* An initial step in this direction has been taken in my Washington lectures. Loo. oit. 



CONSTITUTION 

OP THE 

INSTITUTE OF MATHEMATICAL STATISTICS 

ARTICLE I 
Name and Porposb 

1. This organization shall be known as the Institute of Mathematical Statistics. 
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ARTICLE II 
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1. The membership of the Institute shall consist of Members, Fellows, Honorary 
Members, and Sustaining Members. 

2, Voting members of the Institute shall be (a) the PeUows, and (b) all others who 
have been members for twenty-three months prior to the date of voting. 
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and another for a term of three years. Thereafter the Board of Directors shall elect 
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Chairman of this Committee. 

4. The Institute shall have a Committee on Publications composed of three Members 
or Fellows elected by the Board of Directors. The President shall designate a Vice- 
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ARTICLE IV 
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1. A meeting for the presentation and discussion of papers, for the election of Officers, 
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91 



92 


INSTITUTE OF MATHEMATICAL STATISTICS 


time as the Boaxd of Directors may designate. Additional meetings may be called from 
time to time by the Board of Directors and shall be called at any time by the President 
upon written request from ten Fellows. Notice of the time and place of meeting shall be 
given to the membership by the Secretary-Treasurer at least thirty days prior to the 
date set for the meeting. AU meetings except executive sessions shall be open to the 
public. Only papers accepted by a Program Committee appointed by the President may 
be presented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately before the expiration of their term. Other meetings of the Board 
may be held from time to time, at the call of the President or any two members of the 
Board, Notice of each meeting of the Board, other than the two regular meetings, 
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to the members of the Board by the Secretary-Treasurer at least five days prior to the 
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1. The Annals of Mathematical Statistics shall be the Official Journal for the Institute. 
Other publications may be originated by the Board of Directors as occasion arises. 

ARTICLE VI 
Expulsion oh Suspension 

1. Except for non-payment of dues, no one shall be expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 

ARTICLE VII 

Amendments 

1. This constitution may be amended by an affirmative two- thirds vote at any regularly 
convened meeting of the Institute provided notice of such proposed amendment shall 
have been sent to each voting member by the Secretary-Treasurer at least thirty days 
before the date of the meeting at which the proposal is to be acted upon. Voting may be 
in person or by mail. 
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BY-LAWS 

ARTICLE I 

Duties of the Officebs, Boahd of Dieectoss, Committee on Membbhship, and 

Committee on Publications 

1. The President, or in Ms absence, one of the Vice-Presidents, or in the absence of the 
President and both Vice-Presidents, a Fellow selected by vote of the Fellows present, 
shall preside at the meetings of the Institute and of the Board of Directors. At meetings 
of the Institute, the presiding officer shall vote only in the case of a tie, but at meetings of 
the Board of Directors he may vote in all cases. At least three months before the date of 
the annual meeting, the President shall appoint a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nominations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
voting members at least thirty days before the annual meeting. Additional nominations 
may be submitted in writing, if signed by at least ten Fellows of the Institute, up to the 
time of the meeting. 

2. The Secretary-Treasurer shall keep a full and accurate record of the proceedings 
at the meetings of the Institute and of the Board of Directors, send out calls for said 
meetings and, with the approval of the President and the Board, carry on the corre- 
epondence of the Institute. Subject to the direction of the Board, he shall have charge 
of the archives and other tangible and intangible property of the Institute. He shall 
send out calls for annual dues and acknowledge receipt of same; pay all bills approved 
by the President for expenditures authorized by the Board or the Institute; keep a 
detailed account of all receipts and expenditures, prepare a financial statement at the 
end of each year and present an abstract of the same at the annual meeting of the Institute 
after it has been audited by a Member or Fellow of the Institute appointed by the 
President as Auditor. The Auditor shall report to the President. 

3. The Board of Directors shall have charge of the funds and of the affairs of the 
Institute, with the exception of those affairs specifically assigned to the President or to 
the Committee on Membership. The Board shall have authority to fill all vacancies 
ad interim, occurring among the Officers, Board of Directors, or in any of the Committees. 
The Board may appoint such other committees as may be required from time to time to 
carry on the affairs of the Institute. 

4. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 
different grades of membersMp. 

8. The Committee on Publications, under the general supervision of the Board of 
Directors, shall have charge of all matters connected with the publications of the Institute, 
and of ah books, pamphlets, manuscripts and other hterary or scientific material collected 
by the Institute. Once a year tMs Committee shall cause to be printed in the Official 
Journal the Constitution and By-Laws and a classified list of all the Members and Fellows 
of the Institute. 


ARTICLE II 
Dubs 

1. Members shall pay five dollars at the time of admission to membersMp and shall 
receive the full current volume of the Official Journal. Thereafter, Members shall pay 
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five dollars annual dues. The annual dues of rellows shall be five dollars. The annual 
dues of Sustaining Members shall be fifty dollars. Honorary Members shall be exempt 
from all dues. 

2. Annual dues shall be payable on the first day of January of each year. 

3. The annual dues of a Fellow or Member include a subscription to the Official 
Journal. The annual dues of a Sustaining Member include two subscriptions to the 
Official Journal. 

4. It shall be the duty of the Secretary-Treasurer to notify by mail anyone whose dues 
may be six months in arrears, and to accompany such notice by a copy of this Article. 
If such person fail to pay such dues within three months from the date of mailing such 
notice, the Secretary-Treasurer shall report the delinquent one to the Board of Directors, 
by whom the person’s name may be stricken from the rolls and all privileges of member- 
ship withdrawn. Such person may, however, be re-instated by the Board of Directors 
upon payment of the arrears of dues. 

ARTICLE III 
Sataiues 

1. The Institute shall not pay a salary to any Officer, Director, or member of any 
committee. 


ARTICLE IV 
Amendments 

1. These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend- 
ment has been previously approved by the Board of Directors. 
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ON THE SAMPLING THEORY OF ROOTS OF DETERMINANTAL 

EQUATIONS 

By M. A. Girshick^ 

In a recent paper” Hotelling has considered two functions of the covariances 
of two sets of variates (having a multivariate normal distribution with s variates 
in the first set, if variates in the second, s < t) which he designates by Q and Z 
and which he defines as follows; 

(,i) = ^-rs 

where A is the determinant of the covariances among the variates of the first 
set, B the determinant of the covariances among the variates of the second set, 
D the determinant of covariances of the two sets taken together, and C a deter- 
minant obtained from D by replacing the covariances among the variates of the 
first set by zeros. Both Q” and Z are shown to be invariant under internal 
linear transformations of either set of variates. 

In solving the problem of determining linear functions of the two sets of 
variates for which the multiple correlation is a maximum, Hotelling arrives at a 
set of parameters pi , P 2 , • • • , p, which he names “canonical correlations” and 
which are the positive or zero roots of the determinantal polynomial 



— Xo-ii • • . 

— Xffj, 

cri,»+l 


D{X) = 

— Xo-rt • • • 

• • . 

— X(r,i 
Vl+1,« 

— X(r,+l,,^l • • 

. ~X(r»+i,,4-( 


••• 


— X(r,f(,,+i •• 



The p's are equal in number to the variates of the first set and bear the fol- 
lowing relations to Q and Z : 

(1.3) = pU . . . pJ 

(1.4) Z=(l-p!)(l-p?)...(l-p”). 

The corresponding functions for the sample covariances Hotelling designates 
by q and z, and the sample canonical correlations by ri , rj , ■ ■ • ,r,. Under 
the assumption of complete independence between the two sets of variates and 


* Most of this Research was accomplished at Columbia University under a Graiit-in-Aid 
from the Carnegie Corporation of New York 

’ Harold Hotelling, “Relations Between Two Sets of Variates,” Biomelrika, Vol 
XXVIII, Dec. 1936. 

* The function was first considered by S S. Wilks in iStoOTetrrAja, Vol XXIV, Nov 1932 
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in the case s = 2 and i = 2, he shows that the joint distribution of q and 2 is 
of the form 

(1.5) - 2)(n - 3)2*'"“*’ dqds 
q and z satisfying the inequalities 

0 < 2 < 1, 0 < g < 1, 2 < (1 - g)' 

and the joint distribution of the canonical correlations ri and rj is of the form 

(1.6) (n -2){n- 3)(r? - r^)(l - r?)*‘”“^^(l - dn dr, 

where n is one less than the number in the sample for each variate. 

I 

In Part I of this paper we shall, assuming independence between the two 
sets, find the joint moments of g and 2 for a general value of s and t and extend 
the joint distribution of g and 2 and hence of the canonical correlations to the 
case where there are two variates in the first set and any number of variates in 
the second, i.e. s = 2 and t > 2.^ 

1, Joint Moments of g and z. Since we are assuming complete independence 
between the two sets of variates we may without any loss of generality represent 
the sample values of the second set as points on the first t axes of unit distance 
from the origin in a space of n dimensions The matrix of observations in the 
case of s variates in the first set and t variates in the second set will take the form 

•Cll Xj2 * * ^In 

3^21 3^22 3^23 * ' ’ * ^2n 

Xaj Xfi • ' > Xtt * • Xsn 

1 0 0 0 . . 0 

0 1 0 ... 0 0 

'*0 0 0 . • • 1 • • • 0 

The polynomial D{\) of (1.2) in term,s of .sample variances and covariances 

calculated from (1,7) then becomes 

— Auji • • — Xfij, aiii • • Xu 

““ Xusi ... ““ Xa#s S/ai * . Xst 

xn ••• x.i ~X • • 0 

Xu ■■■ x,i 0 • • —X 

where a,, = ^x,x, . 

1 

^ 1 his extension is a generalization of Hotelling's method loc. cit. 
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We multiply the first s rows of (1.8) by X and factor out X from the last t 
flolumos. This yields 


(1.9) 


D(X) = X‘“* 


1 

■ 

■ “X^Uij 

*11 ■ 

• Xll 

— X^Ort ■ ■ 

• X^fli, 

X,i • 

• X.t 

Xn . . 

• a:.i 

-1 •• 

■ 0 

Xu ■ • 

• x„ 

0 .. 

. -1 


As a further simplification, we multiply the (s + j)‘’' column by x„ for all j 
from 1 to t and add the result to the i*"* column. When this is done for every 
value of i from 1 to s and the resulting determinant expanded by means of the 
last i columns, the determinantal pol 3 momial (1.9) becomes 


jD(X) = X'~’ 


6n — X^ttn 6 i 2 — XVis * ■ * f>li “ X^Bl, 
6ii ■” X CLfi 5*2 X a,2 • * ' htt X a*. 


or symbolically 

(1.10) D(X) = x‘- 1 h, - 1 

< 

where == X) *«*}• • 

1 

Hence the s roots of D(\) which do not necessarily vanish may be obtained 
from the polynomial 

(1.11) Q(X) = I bii - X*o., |. 

The coefficient of the highest power of X in Q(X) is given by | o,, |, the deter- 
minant of the elements o„ . Taking this in conjunction with (1.3) and (1.4) 
we see that 


. ^ ^) ^ lk.1 

( 1 . 12 ) 

I a.; I I Oir 1 

n 

where dj = X . 

<+i 

From the equations (1.12) we obtain 

(1.13) E{\ai, = E[\b„ |‘"| c.; |'’} 

where E stands for the mathematical expectation of the expressions in the { ) . 

It is obvious from the definition of 5„ and c„ that the two determinants 1 6„. j 
and I c,/ 1 are independently distributed. Moreover, the joint distribution of q 
and a does not depend on the determinant j Oif |. The truth of the latter state- 
ment can be seen from the following geometrical considerations. If we con- 
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sider the sample values of each variate as a point in an ^-dimensional space, 
then the two sets of variates determine two flat spaces, one of s dimensions and 
one of t dimensions in that space A sample canonical correlation can then he 
considered as the cosine of a certain minimum or stationary angle between two 
lines, one line lying in the flat s space and the other in the flat t space. Since g 
and z are functions of the canonical correlations, they therefore depend only on 
lines and angles between two planes. The quantities a„ on the other hand, 
depend on lines and angles lying entirely within one of these planes. 

From the above considerations we see that equation (1.13) can be written as 

a,, = ^(1 h.v p“)A?(| c*,- f) 


or 

(1.14) 


Eiq^z^) 


i;(lh.,|‘“)E(lc.',l^) 


The TO*’' moment of a determinant | d,, | of sums of sample cross products of p 
variates is given by the formula* 


(1.15) 


nprn P 

«l*.n-r|-T;n 


ID 


1'" <=1 




n + 2m -f-1 — i 


')■ 


. r(»-±l^*) ; 


where Z)„ denotes the cofactor corresponding to cr„ divided by the determinant 
1 (r„ j. Substituting (1.15) in (1.14) and simplifying, we get for the joint 
moments of q and z 


(1.16) E(qV) = 


t 

n 


^ + 1 - - < + 2^ + 1 - -h 1 - i 'Sj 

r('.±l^)r(^^^^i^')r(^‘’-+ ^ - - -*) 


2 Joint Distribution of q and z for s = 2, t > 2. In order to determine 
the joint distribution of q and z for s = 2 and < > 2, we shall first prove the 
following lemma. 

Lemma: Let q and z be defined as in {l.l) for two sets of variates having s variates 
in either set and lei q' and z' be similarly defined with s < t where s is the number 
of variates in the first set and t the number of variates in the second set, then for 
n = t + s, the joint distribution of (fi and z is identical with that of z' and q'^. 

Proof. If the number of variates in either set are the same and n = t + a, 
then by (1.12) 


q 


2 




' G£. S. S Wilks, “Certain Generalizations m the Analysis of Variance,” Biometnka, 
Vol. XXIV, Nov. 1932. 



where 

(1 17) 
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i> i+g c-t~e 

btj ” ^ ^7 J ^17* ^1 3/7 I fltt; = V '/ Xi X] 

»+l 


J+» 

E 

1 


and s = t. 

However, for s < t, and n = t + s, we take for the second set of t variates 
points on the t axes at unit distance from the origin in the {t + s)-dimensional 
space 'perpendicular to the first s axes. The matrix of observations in this case 
takes the form 


a:ii 

a:i2 

• • a:i. 

3^1,1+! 

• • • 

Xix 

3:22 

• • Xi. 

3^S.«+1 

• • • 3J2,j+( 



* ' X.B 

3/«,«+i 

• • • 

0 

0 

... 0 

1 

... 0 

0 

0 

... 0 

0 

... 1 


Emplo 3 dng the same arguments as in equations (1.8) (1.9) and (1.10) we find 
that 


(1.19) 

where 


Q(x) 

— 1 Cif X dij 1 , 

„lZ _ 1 Ct7 1 

2 ' = 

} ^ 1 1 

1 1 

1 1 


i 

e+g 

t+g 

h,," 

■“ Xj j Ctj “ 

3/* X} j 

Clij “ ^ Xj . 


1 

«+l 

1 


Comparing these equations with (1.17) we see that 

(1.20) 2 = g'^ (f = z'. 


This proves the lemma. 

Now let s = 2. Setting n = t + 2 in equation (1.5) and using the trans- 
formation (1.20) we get for the joint distribution of q' and z' 

(1.21) - l)g"~' 2 '-‘ dq‘ dz'. 

Let r be the correlation between the two variates of the first set. The distri- 
bution of r in samples for which n = t 2 when the population correlation is 
zero is known to be 



The distribution of r is independent of q and z Hence, the joint distribution 
of §', 2 ', and f is given by the product of (1.21) and (1.22). Dropping the 
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primes from q' and z' in (1.21), we get for the joint distribution of the three 
quantities in the case w = / + 2, 


(1.23) 


t{t " 1) 




^‘-*2-4(1 _ dqdzdr. 


We shall now derive the joint distribution of q and 2 for a general value of n 
for 5 = 2, / > 2. We set ~ x, x% — y and take the t sample variates of the 
second set to be points on the first t axes at unit distance from the origin in a 
space of n dimensions. As in (1.12) calculate q and z. 


(1.24) 


t t / ^ \2 n n / « \2 

2^1 1 \l / ^ / 

, , z __ 


We transform the points {Xi, • •• , «„) and {yi, ■ ■ • , Vn) to hyperspherical 
coordinates, the transformation to be represented parametrically by the 
equations 


Xx = sin dx sin 6i • • 

. . sin 0(_i sin 0| 


Xx — cos 01 sin 0j . 

• . sin 0i_i8in &t 


x» = cos 6i . . 

• . sin 0(_i sin 0, 


(1.26) a;, = 

cos 0,_i sin 01 


«(+! = 

cos 01 cos 01^1 


*1+2 == 

cos 0| sin 01+1 cos 0 i +2 


X„-l = 

cos 01 sin 0i+i sin 0 i +2 • ■ 

. • cos 0«_i 

Xn = 

cos 01 sin 01+1 sin 0t+8 . ■ 

. . sin 0„_i 


with the same representation for the y’s in terms of parameters 'f>i , </>a , - • • ^n-i . 
It is to be observed that in (1.24) and (1.25) 2a;' = 1 ,, 22 /' = 1. This we may 
assume since q and z are invariant under such transformations. 

In this new coordinate system, our samples (xx, ■ ■ • , x„) and {yi, • • ■ , J/n) 
are taken as random points on a unit hypersphere about the origin in n dimen- 
sions. There is no loss of generality in this since x and y are assumed to be 
uncorrelated in the population and hence possess spherical symmetry of the 
density distribution in a space of n dimensions. 

The element of probability for the x points on this hypersphere is proportional 
to the [n — l)-dimensional area on this sphere. Now the n — 1 dimensional 
area is given by 


y/g dQi dh • • . dd^-i 
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where g is & determinant of order n — 1 in which the element in the row 
and column is 

dXadXa 

de^ae, ' 

When i ^ j, all these quantities vanish as can be seen by inspection from 
(1.25). When i == j, we have 


? UJ = “ 


1 \WJ 


sim 62 sin ^3 • • • sin dt 


sin^ 03 • • • sin“ dt 


/ dXa • 2 o 

1 \S^2/ ~ 


T { = 

1 \90„_i/ 


cos® d, sin® 0,+i • . • sin® 0„_2 . 


Therefore 


g == sin® 02 sin^ 03 • • • sin®^' “ dt cos®^" 0t • ■ ■ sin® 0„_! 


and hence the element of generalized area is given by 


(1.26) 


sin 02 sin® 03 • • • sin ^ 0< cos" ‘ dt 


sin" ‘ ® 0(+i . • • sin 0„_2 ddi dd 2 ■ ■ • d0n-i • 


Similarly we can show that the element of generalized area for the y point is 
sin 02 sin® 03 • • • sin‘“^ 0( cos"“'“' 0t 

(1.27) 

sin" ‘ 01+1 • ■ . sin 0„_2 #1 d02 ■ • • d0„_i . 

The joint distribution of 0i , 02 , ■ • • , 0n-i and 0i , 02 , • , 0n-i (since the 
9's are independent of the 0’s) is proportional to the product of (1.26) and (1.27). 

We now introduce four new sets of variables, u, v, u', v', defined by the 
following equations 

(1 28) Xi = ■R, sin 01 , y, = a, sin 0i (i = 1, 2, • • , t) 

(1.29) Xj = u'j cos dt y, = v', cos0t (j = « + 1, • ■ , n). 

The Ui and u, can bo regarded as two points on a sphere in a space of i dimen- 
sions and u'l and 11 j as two points on a sphere in a space o( n — t dimensions. 
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Let X be the angle between the two points u and v and the angle between 
the two points u' and v' . Then 

I n 

cos X = Z) U^i!^ ; cos M = Z w'j;' . 

1=1 ;“i+l 

The probability element for X is proportional to sin‘~“ X d\ and that for n is 
proportional to sin”'*”^ P dji. 

From the definition of m, and a, , we see that they depend only on , . . , , 
fl(-i ; 4>i, <p 2 , ,<pt-i respectively, and u, and «, depend only on 
fl<+ 2 , • - ■ , ^n-i ; , 4>t+2 , respectively. It follows that the quanti- 

ties X, fi, dt , and 4>t are independently distributed. 

The joint distribution of the e’s and <#>’s we integrate between constant limits 
with respect to all the variates except 6t and . This gives for the joint 
distribution of and (l>t 

An sin‘“‘ dt sin‘~^ 4>i cos"”*”’ cos””*~^ dOt d4>t 

where An is a constant depending only on n. 

Multiplying this by the distributions of X and n and dropping the subscript ( 
from 6 and <j> we get for the joint distribution of X /Jt, d, and (t> 

(1.30) hn sin‘~‘ d sin‘“^ 0 cos""*”^ d cos"”'"^ (p sin‘““ X cos"~‘~’“ udddpdX dii 

where kn is a constant depending on n. The limits of integration for 8 and <l> 
are 0 and 7r/2; for X and /a they are 0 and x. 

Expressing q and z in terms of the new quantities as defined in (1,25), (1.28) 


and (1.29) we get 

it Ait v)- it w) 
(^ 01^ \ 1 /\ 1 / \ 1 / 

_ sin** 9 sin** (p sin* X 


^ 1 - 

1 _ 7-2 

(1.32) 

itAitv)-(txv) 
„ _ v+l /\(+i / \i+i / 

_ cos* 6 cos* (p sin^ y 

" ■ 1 - 

1 - r* 

where 



(1.33) 

r = l^xy = sin 8 sin <f> cos X cos 8 cos 0 cos y 

is the sample correlation between x and y. 



We now consider a transformation of the variables d, <j>, and /j. in (1.30) to 
the new variables q, z, and r. Without troubling to compute the Jacobian J 
of the transformation, we know that it is independent of n since the relations 
(1.31), (1,32) and (1.33) do not involve n. Substituting from (1 31) and (1.32) 
into (1.30) we get for the joint distribution of q, z, r, and X 

kn^Pq‘~^z^^’'~‘-^\l - dq dz dr d\ 
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where is independent of n. Integrating with respect to X between limits which 
are independent of n, we get for the joint distribution of q, z, and r 

^2 34 ) dq dz dr. 

But, for w = i + 2, this joint distribution reduces to (1.23) Therefore 




T 


so that (1.34) can be written as 

fc;g‘"V^''"‘“'>(l - dqdzdr. 

However, since the distribution of r is known to be 


© 


C-i-0 ^ 


(1 - dr 


we finally get for the joint distribution of q and z 

hnq‘~Y’'~‘-''‘ dq dz 

where kn depends on n. The integral over the entire region defined by the 
inequalities 


2(1 ~ 2)! (ra - t - 2) 
(1.35) 


0 :< O’ < 1, 0 < 3 < 1, z < {I — qY 

the constant K is therefore readil 
. Thus the joint distribution in the final form is 


must equal unity; the constant K is therefore readily found to be 
in - 2)! 


(n - 2)! 




2(t _ 2)! (n - t - 2)! 

Now by (1,3) and (1.4), q = rira , z = (1 — r?)(l — rl), and hence the Jacobian 

diq, z) 


(1.36) 


9(fl,>’2) 


2Cr? - rl) 


Making the transformation in (1.35) we get the joint distribution of the 
canonical correlations ri and ri (for the case s = 2 and a general value of t) in 
the form 




,2\ iHh— t--3) 


dri dri . 


(1.37) 
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II. JOINT LIMITING DISTRIBUTIONS OF CANONICAL 
CORRELATIONS AND LATENT ROOTS 

In formula (1.37) we set 

fci = nr\ , hi = nr\ 

and get for the joint distribution of ki and ki 


[(' 




Sik\ (2k^ I 


_ 2)1 / A**"-'"’’ 

When n-* », the quantity — r~2)! ^ ~ 

approaches Hence the limiting distribution of the two canonical correla- 
tions is given by 

( 2 . 2 ) ~ ■ 

We shall call (2.2) the “generalized chi-square” distribution and show that the 
roots of the characteristic polynomial 

an — A Oi2 

(2.3) <p{k) = 

Uii 022 ■“ k 

are distributed in precisely this form. Here Oj, = Sx.a:; where xi and Xi are 
normally and independently® distributed with unit variance in the population 
and zero mean in the sample. 


Let ki and h be the roots of (2.3). That is, ki and h are the two roots of 
the quadratic equation 

(2.4) k^ ~ pik + Pi = Q 
where 

(2.5) Pi = fci + ^2 = On + fl22 

(2.6) Pi ~ kiki = ajifl22 — 012 . 

In the absence of correlation in the population, the joint distribution of an , 
022 and 012 is known to be 

(2.7) h„ 6“*'““'''''*** doii da22 dai2 

O2I 022 

where /i„ is a constant depending only on n. 

® The part of the assumption relating to independence may be removed without loss of 
generality See last paragraph below. 
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We consider a transformation to the variables pi , pa and Oia , 
and (2.6) we calculate the Jacobian J of the transformation, 


( 2 . 8 ) 


J = 


1 


ctii — (hi 


From (2.5) 


and since 

(2.9) 

(2.10) 


2a, < = Pi ± (p? — 4p2 — 4a?s)^ 


J = 


1 

(pi — 4ps — 4a?4)^ ■ 


Substituting from (2.5) and (2.6) into (2.7) and multiplying by J, we get for 
the joint distribution of hi , and 


(2.11) 




dpi dpi dcLii 

(pi - 4p2 - 4a u)* ’ 


We make the transformation u = ah and get for the joint distribution of h , 
h and u 


( 2 . 12 ) 


An ^Kn-a) „~ipi dpi dpi du 

2 (hu - 


where 6 - p* — 4pj . 

Since both an and oa are real, equation (2.9) shows that 6 — 4w > 0. Hence 
the limits of integration for u are 0 and | . Integrating out u in (2.12) between 


the above limits we obtain the joint distribution of pi and ps . 
Now the integral 


(2.13) 


du 

*0 (bu — 4.u^)^ 




8m + 

b /Jo 


c 


where c is some constant. Hence the joint distribution of pi and pi is given by 
(2.14) Hnpl'""’' e"*”* dpi dpi . 


By integrating (2.14) over the region 0 < p* < and 0 < pi < ■» we get 
ffn = i(n - 2)1. 

We next transform pi and pi in terms of fci and fca from (2.5) and get for the 
joint distribution of h and 


(2.15) (hi - hi) (jfcife)*^”-=> dhi dh . 

This distribution is identical with that of (2.2) with n = t. 
The above is an example of a more general 
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Theorem: Let n, n, ,r, be a set of simple finite canonical roots of the 
two independent sets of variates Xi , • • • , x, , and a:,+i , • • • , x,+i . Let ki = 
nr< (i = 1, 2, ■ • • , s). Then the joint limiting distribution of the k’s approaches 
the exact joint sampling distribution of the latent roots of a matrix of sample product 
sums with t degrees of freedom of s normally distributed variates having unit variance 
in the population. 

Proof: The proof follows from equation (1.11). For let us multiply and 
divide in (1.11) by n and set = h. The determinantal polynomial 
becomes 

(2,16) ~ I kSx ] |. 

Without any loss of generality, we so transform the first set of variates that 
they become of zero correlation and unit variance in the population. Then it 
follows that 

where S„ equals zero for i 9^ j and 1 for i = j. 

Now let P(x > a) stand for the probability that the variate x be greater 
than or equal to some constant a. Then, by the Strong Law of Large Numbers 
we can state that, given an « > 0 and a 5 > 0 there exists a positive integer «« 
such that for n > no 

P {\ Sx , - aq’l > 5} < 6. 

I 

If then we let n increase indefinitely, the quantity h,,- = 22 a:, a:, remains fixed 

1 

while Si, approaches, in the probability sense, 5,/ . Since the roots of a poly- 
nomial are continuous functions of the coefficients, we can, by an extension of 
the Law of Large Numbers, show that in the limit the roots of (2.16) will be 
distributed like the roots of the polynomial 

<e>{k>) = 1 f>„ - k&i, j. 

This proves the theorem. 

Corollary 1. The limiting distribution of (fi in case of complete independence 
between the two sets of variates approaches the exact distribution of a generalized 
sample variance {i.e. a determinant of sample variances and covariances) with t 
degrees of freedom. The proof follows from the fact that (fi is a product of the 
roots of (1.11) and therefore by the above theorem, is distributed in the limit like [ 6,, |. 

Corollary 2. The distribution of the sum of the squares of the canonical 
correlations approach in the limit a x distribution with st degrees of freedom. 
This is obvious since in the limit the sum of the squares of the roots, by the above 
theorem, has the distribution of fen -p 622 + ■ • ■ + b„ and each bj, is distributed 
like with t degrees of freedom. 

While the canonical roots of (1.2) are invariant under any non-singular linear 
transformations, the latent roots of a determinant of sample covariances are 
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invariant only under an orthogonal transformation. But there exists an or- 
thogonal transformation which reduces a set of variates having a multivariate 
normal distribution to a set which are normally and independently distributed 
with variances equal to the latent roots of the population generalized variances 
of the original variates. Hence, in dealing with the distribution of latent roots, 
we may assume independence in the population without any loss of generality 
but the assumption of equal variance leads only to a special case. Moreover, 
the above consideration also explains the form of the asymptotic error of the 
sample latent root given in Part HI of this paper. 

III. ASYMPTOTIC STANDARD ERRORS OF LATENT ROOTS AND 
COEFFICIENTS OF PRINCIPAL COMPONENTS 

1. Many statisticians have had occasion to use in their statistical analyses 
characteristic roots (or as they are sometimes called “latent” roots) of deter- 
minants of correlations or covariances. Especially has this become true since 
the publication of Hotelling’s paper on principal components.’ It is therefore of 
great importance to find, if not their sampling distributions, at least their 
limiting distributions and their asymptotic standard errors. This we shall do in 
this paper for the case of non-vanishing simple roots and by the same method® 
get the asymptotic variances and covariances of the coefficients of principal 
components. We have already derived in Part II the sampling distribution of 
the two latent roots of a determinant of covariances obtained from two nor- 
mally distributed variates having equal variance in the population. This 
distribution is of no great importance in itself except that it gives us some idea 
as to the form of the distribution in the general case 

In what follows, we shall use the convention that a repeated subscript in the 
same term stands for summation. If repeated subscripts appearing in a term 
are not to be summed, we shall place them in brackets following the expression 
in which they appear. Thus in the equation (3.1) below, we sum with respect 
to j but not with respect to q even though on the right hand side q appears twice. 

Let a:i , ajj be a set of variates which have a multi-variate normal 

distribution. We assume that these variates have been resolved into com- 
ponents by Hotelling’s method.® Let n, n, ■■■ > r« he the principal compo- 
nents. Then a:, = a,,-7, . The a,,’s satisfy the following equations: 

(3.1) a/jCj, = , [3] 

,(3.2) 


’ “Analysis of a Complex of Statistical Variables into Principal Components," The 
Journal oj Educational Psychology, Sept, and Oct, 1933. See also M. A Girshick, “Prin- 
cipal Components,” Journalof the Amertcan Statistical Assoctalion, Vol. 31, Sept. 1936. 

* The method here employed is parallel to the one used by Hotelling in his paper of 1936 
in deriving asymptotic standard errors for canonical correlations. 

’ Loc cit. 
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where the symbol Spj has the value zero for p q and 1 for p = q, A, is a root 
of the characteristic equation 

(3.3) I X5ij j = 0 


and (Tt,' is the population covariance of Xt and Xj . 

If we multiply (3.1) by a,p , sum with respect to i and use (3.2), we get 

(3.4) = Xj)6p(, I 


When a root of (3.3) is simple and not equal to zero, the corresponding a,,’s 
and the root itself are definite analytic functions of the cr,,’s over a region without 
singularities. A set of sampling errors da, in the covariances will then deter- 
mine a corresponding set of sampling errors in the Oi,’s and in the root. 

We assume then, that the roots Xi , X« , • • • , Xt of (3.3) we are considering are 
simple and non-vanishing. In terms of the derivatives of the analytic functions 
we define 


(3.5) 


da,k = dcTp,, 

OCpg 


d\r = d<Tp, 

vfTpg 


where da-pq — s 


p« ~ being the corresponding sample covariance. 

Differentiating equation (3.1) and employing the above formulae we get 


(3.6) 


fffjdQijq [ Qfjqdff — — ^qdOuq | Ctj^dX^. [^] 


We now multiply this equation by a,p , sum with respect to i, and use equations 
(3.1) and (3.2). This yields: 


(3.7) "Xpdjpddjq "b d{pQ]gd<Tx^ \qdipd(liq -b \q5pgd\q, [p, 


When p = 5 , the term 'Xpa,pda,p cancels out and equation (3.7) reduces to 

(3.8) XpdXp — — Uf jjCt^pdo", , . Ip) 


We change the subscripts p, i, j, to q, h, m, in (3.8) and multiply together the 
two equations thus obtained. This gives: 

(3.9) TipXqdXpdXq ~ dxpO/jpdkqflm^^^ijdfffcm * [p, 

Hence 

(3.10) \j^qW(dXpd^^ — (itpOf]p(ikqdpiqE(d(Xi,diTkrr^ Ip, 

where the symbol E denotes the mathematical expectation or mean value of the 
expression following. 

Now it can be easily shown by means of the characteristic function of a 
multivariate normal distribution that 
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where n is one less than the number in the sample. Substituting this expression 
in equation (3.10) and using (3.4) we get the following rather simple result 

(3.12) Xp'KtEidXpd,}^,) = [p, q] 

7t 

Setting p = g in this formula we get 

( 3 . 13 ) nM = 

But when p q 

(3.14) EldXpdX,] = 0 . 

Let h, h, • ■ , h , be the corresponding latent roots of a determinant of 
sample covariances. The sample latent root Ip may be expanded about Xp in a 
Taylor series of the form 


Zp = Xp + — ’ dari + ^ dffridcup + 


dcTfl 


Zj, 


2 ^(Tfi Bffu 


Xjj rfXp “1“ • 


(3.15) 

or, by (3.5j 

(3.16) 

Squaring both sides of (3.16), taking the expected value, and using (3.13) we 

find that the sample variance of a latent root Ip , apart from terms of higher order in 

-1 , . , 2X’ 

n , IS given by . 

If in (3.11) Y/e set i = j — k = m, we get the variance of a sample variance, 
and it is interesting to note that its form is identical with the first term of the 
asymptotic expansion of the variance of a sample latent root. 

The sample covariance of any two distinct roots is by (3.14) zero for the first 
terra of tlic asymptotic expansion. Tliat is, tlie covariance is at least of order 
All the above results also follow from the fact, showm by the author in a 
previous paper,'® that the coefficients of the principal components and hence the 
latent roots arc maximum likelihood statistics. This property of the latent 
roots permits us also to stale the following 
Theorem: Let Xi , Xa , • • • ,Xihe any set of simple non-vamshing roots of (3.3). 
For sufficiently large samples these will be approximated by certain of the latent roots 
Zi , Za , • • , Z( 0 / the samples. If Z, — X. is divided by the standard error 


<ri. 


-i/l 


the resulting variates have a distribution which, as n increases, approaches the 
normal distribution of t independent variates of zero mean and unit standard 
deviation. 


Log. cit 
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CoBOLLAliY; Let Xi be a maximum simple, aon-vanishing root of (3 3) and let 
k be the corresponding maximum sample root. Then, h — Xi divided by its 
standard error has a distribution approaching normality in the limit. 

2. The Variance of Log 1. The formula for the standard error of the latent 
root given above contains a population parameter X the numerical value of 
which we usually do not know. It is therefore important to find atransforma- 
tion of the latent root to a new variate which will have or its leading term of the 
asymptotic standard error a quantity independent of the population parameter, 

Let k = f{l) be such a transformation. Then K = /(X) is the corresponding 
transformation for the population root. 

We now expand fc in a Taylor series about I = \ 

(3.17) dk = f'{\)dl + |/"(X)(dl)“ + . ■ . 
and get an approximation 

(3.18) dk = f(\)dl. 

Squaring both sides and taking the expectation, we get 

(3.19) EidkY = [rmEKdiY] = i/'(x)r . 

Now set E{dkY = 2/n. Then, from (3.19) 

/'(X) = 1/X 
or 

(3 20) /(X) =logX 

Hence, if we set k — log I, then 

(3.21) at 2ln 

is an approximation to the variance of h and is independent of any population 
parameter. 

3. The Asymptotic Variances and Covariances of Roots of Determinants of 

Correlations. While the formulas for the asymptotic standard errors of the 
latent roots of a determinant of covariances are rather simple, this is not the 
ease with the roots of a determinant of correlations. In deriving the asymptotic 
standard errors of simple non-vanishing roots of a determinant of correlations, 
we again assume that the variates Xi , , • ■ , x, , which in this case are of unit 

variance in the population, have been resolved into principal components. The 
equations of the previous section, up to and including (3.10), remain the same 
except that we substitute p,, for every o-,, , where p,, is the population correlation 
of X, with X, . Thus equation (3.10) becomes 

(3.22) XpXaE(dXpdX 4 ) = Oipa,„ai,o„jE(dp„dp*,m), [p, q] 
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where dp^, = "pi, , i\i being the sample correlation between and Xi , 
The expected value of dpi,dpim is not, as in the case of the o-’s given in the simple 
form of (3.11) but rather it is given asymptotically, the leading term in 
being the following lengthy expression: 

nMidpijdpkm) " Pikpmi "1“ Pk,Pm\ PiiPkiPmx PilPkjPmj 

(3.23) PkmPkiPkj “b 2PxiPkmPk% "b \pi 3 PktnPki 

— PkmPmxPm, 4" iPtjPkmPmi + ^PijPkmPml • [f, j, k, m] 

Substituting this in (3.22) and simplifying by means of equations (3.1) and 
(3,4) we finally get 


(3.24) 

2(Xp5j)g "1“ XpXgfltpfljgPi 

When p = 

g, (3.24) becomes 


(3.25) 

MidXpf] = - [xp -b alalp^i ~ 2Xp E ajJ 
n\_ 1-1 J 

[p] 

When p 7^ q, 


(3,26) 

2 2 

n 

[p, ?] 


Hence (3.25) is the leading term of the asymptotic expansion of the variance 
of \p , and (3.26) is the leading term of the asymptotic expansion of the co- 
variance of }xp and Xq , where Xp and X, are simple, non-vanishing roots of a 
determinant of correlations. 

4, Asymptotic Variances and Covariances of Coefficients of Principal Com- 
ponents Derived from a Determinant of Covariances. Let a;, = o,, 7 ,' be the 
equation of transformation of the variates X\ , , ■ • ■ , x, into principal com- 

ponents. In what follows we assume that the latent roots of the determinantal 
equation (3.3) are simple and none equal to zero. The last restriction makes the 
determinant of covariances non- vanishing. The determinant of the a.,’s will 
therefore be also different from zero. With these assumptions in mind, we now 
proceed to derive the asymptotic variances and covariances of the a,/s. 

We set p = g in (3.2) and differentiate the result. This yields; 

(3.27) dXp = 2aipdai„, [p] 

where the summation index i was replaced by 1. Substituting for dX,, from (3,8) 
we get: 

(3.28) O'fpCtjpdo'ij — 2XpCiipd(iip . [p] 

Now, when p ^ q, equation (3.7) reduces to 

Xpdjpdcijq I ctipOfjqdci, Xqdipddiq , [p , g] 
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or 

(3*29) UipfXjgdo'ij = (Xg \^Cllpd(Zlg . [P; O'] 

We combiae equations (3.28) and (3.29) into one equation 

(3.30) (iipCLjqd<T\j = (Xg d” <pgXj))aipda(g ^ [p^ 5'] 

where «pa has the value 1 when f = q and —1 when p g. The reciprocal of 
Xe + «paXp , (which is different from zero), we denote by h,p . Then equation 

(3.30) can be written as 

(3.31) dxi^qpO, jgdcfx] (Xlpdfdlg , [p, g] 

Since the determinant | a,-, [ of the a,/s is different from zero, we can solve this 
set of homogeneous linear equations for daigS, (1 = 1, 2, ■ ■ • , s). To do this 
we multiply equation (3.31) hy where is the element of the row and 
p^** column of the inverse of the determinant j Oi,- 1 , and sum with respect to p. 
Since = 6a we get, 

(3.32) A'’'axphgpa,gd<Tij == Biidaip — datg . [g] 

We now change the subscripts i, i, t, p, g, in (3.32) to fc, m, r, u, v, respectively, 
multiply the two equations thus obtained, and take the expected value: 

(3.33) JEJ (dO> ^ A A Cl\pQ/k\J^gpbituO/}qCljnxt^(dc%jd<7km^* [ffj 

Substituting for Edcr,idckm its values from (3,11) and simplifying by means of 
(3.4) we get : 

(3.34) Eidatgdarv) = — * A‘'' A^'-b^ghg, + E A‘“A^“6g„6„„X* 

Th ft 

where we sum only with respect to u We may simplify this formula to some 
extent by employing the relation: .4*® = a/g/Xj . (This relation is obtained 
from (3,2) by multiplying each side of that equation by A'^ and summing with 
respect to p). When this is done and the values for the b’s are substituted, the 
final result becomes: 


EidatgdOn) = 


Xv Xg O/tt Ctrg 


(3.35) 


^ (Xfl d” Xg) (Xg d" Xg) 
hUo 


^lu Clru 


j *'qv “Hi M-ru 

n u-i (Xg d" e5HXu)(Xi, d" e»uXu) 

From this we derive the following specific formulas : 


Eidatgdarg) = 

4n 


(3.36) 


d- 


Xg ^ diqdrq I ^ 


Clta dn 


(Xg - \.r 
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aft, 


(3,37) M(da>,) ] = ^ + 


a?i 


+ ^ + 

4X“ 


+ 




(X, - 


(3,38) E(da,,daJ = (q ^ t^) 

Formulas (3.36), (3.37) and (3.38) give us the leading terms of the asymptotic 
expansions of the variances and covariances for the principal components. It 
should be remarked that the coefficients of “mutual regression” equations can 
be easily shown to be proportional to those of the principal components. Hence 
their asymptotic standard errors and covariances may be derived in a .similar 
manner and will be of the same form. 


5. Variances and Covariances of Latent Roots when the Population Roots are 
Equal. Let h, h , • ■ ■ , fcp be the latent roots of a generalized sample variance 
of p normally distributed variates. 

Ordinarily the subscripts of the roots designate their ranks, so that kj > ^2 > 
... > kp . We may, however, assign to a root a subscript from l.to p without 
any regard to its size.” If this is done randomly for every sample of n observa- 
tions the mathematical expectation of ¥ik',ki ■ ■ ■ will be the same for every 
permutation of the subscripts i, j,k, ■ ■ ■ . This fact permits us to calculate the 
variances and covariances of the above roots. 

We may assume, without any loss of generality, that the p variates are 
independently distributed,”* and furthermore we assume the population roots to 
be all equal to unity. Then equation (3.11) becomes 

(3.39) Ei^SijSkm) ~ Sij6km H" "b ^im^ji). 

TV 

Where Sjg is the sample variance of Xp and x, and is the Kronecker delta. 
Now it can be easily shown that 

(3.40) ^ ^ k, X) — s^,) = X} kik,-, X] + 2 X) sf, = X) k^- 

1 1 1<7 •<) 1 »<) 1 

Hence E(k) = 1, and 

£1(E k^) = su + 2 E s?,) 

*<J 

or 

(3.41) pE¥ = pEal + p{p -l)Ea^u , ^ j) 

Substituting from (3.39) in (3.41) we get 

E{k^) = 1 -f t±l, 
n 


“ This approach was suggested to the author by Professor Hotelling. 
See Part II, last Paragraph 
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The variance of h is therefore given exactly by 
(3.42) 4 = E{k^ 

7l 

In a similar manner we find the covariances of fc, and kj to be 


(3.43) 


(fktkj = 


1 

n 


IV. DISTRIBUTION AND MOMENTS OF QUANTITIES RELATED TO 

q AND 2 

From the known distribution of q and z and their expressions in terms of the 
ratio of determinants given by (1.1) and (1.12), we can derive moments and 
distributions of several related functions of sample variances and correlations 
of two independent sets of variates. 

(4.1) Let p = - = by (1.12). 

2 I I 

Since the two determinants in (4.1) are independently distributed, the sampling 
distribution of p, given in the above form, can be obtained for a general value of 
s and i from Wilks’'^ distribution of the ratio of independent generalized variances. 
Thus, for s = 2 and t > 2, the distribution of p is given by 


(4.2) 


r(w - 2 ) dp 

2r(t - i)r(n - « - 1)^ (i + Vp)'-2’ 


When the number of variates in each set is the same, the numerator of q^ in 
(1.1) becomes the square of the determinant of covariances between the two sets of 
variates. Thus 


(4.3) 




I ! 1 (tafi I 


where i, j, take on values from 1 to s, «, take on values from s + 1 to 2s, and 

ft 

diiV “ • 

1 

If the two sets are independent, the quantities | | , j | , are inde- 

pendently distributed. Hence 


(4.4) E(| D = Eq-il a„- 1) Ei\ o„s l^-). 

Setting /3 = 0 in (1.16) and employing formula (1.15) we get for the moment of 
I a, a I 


(4.5) F(la,„r)=: 



“ Loc. cit., pp. 478-479. 
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where 4u« denotes the cofactor corresponding to cr^v divided by the deter m i n ant 
I (f„, 1 , ir«» being the population covariance of x„ and . 

We may replace the product sums in (4.3) by sample correlations and, with the 
assumption that all the variates come from independent populations, obtain the 
moment of the determinant of correlations between the two sets as 

/n + m — i 


(4.6) E{\u.n 




0 


/ n + m \ —1 


+ m — f + l\ /s + m — f+lV 

V 2 M 2 ) 


This follows from the expression for the ?n‘'' moment of q and the formula 


p. /A [^fn + 2k - t + 

W . -h 








(4.7) ^(1 1*) - S ' 

derived by Wilks.^^ 

If we set s — i = 2, the numerator of g* in (4.3) becomes the square of a 
determinant of sample covariances (or correlations) known to psychologists as 
the tetrad. We shall here derive its distribution under the assumptions that the 
four variates are independently distributed. 

We write 

T 


U1U2 


(4.8) q = 
where 

(4.9) T = — runs, ni = (1 — r?2)*, Ws = (1 - rlif 

and q is taken as positive. 

Now the distribution of g for s = < = 2 is given by 

(4.10) (n - 2)(1 - qy~^dq 
and the distribution of u is known to be 


2r 


(4.11) 


© 


v'»'r(”-^') 

Hence the distribution of ui,ua and g is given by 

««-2) '■’(l) 


w”-*(l - uTUu. 


(4.12) 






■p- (1 - g)"~®(wiW2)” “[(1 - w?)(l - "ul)] ’‘duiduidq 


“ Loo. cit , p 492. 
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Performing the transformation (4.8) and integrating out Ui and Wi we get for 
the distribution of the tetrad 


(4.13) 


4(n-2)r^(2) 


r C — T)"-^ 

T — Wl)(l — ul) 

Ml 


dui, dm. 


All the moments of T can of course be obtained by setting s = 2 in (4.6).“ 


U. S. DuPAATMENT op AoBIOniiTURB, 
Washington, D. C. 


“ The limiting distribution of the tetrad was given by J. L Doob in an article entitled 
“The Limiting Distributions of Certain Statistics,” Annals of Mathematical Slatishcs 
Vol. 6, (1935). For a more general distribution of the tetrad and other statistics considered 
in this paper see W. G. Madow, "Contributions to the Theory of Multivariate Statistical 
Analysis,” Transactions of the American Mathematical Society, Nov. 1938. 



AN OPTIMUM PROPERTY OF CONFIDENCE REGIONS ASSOCIATED 
WITH THE LIKELIHOOD FUNCTION’ 

By S. S. Wilks and J. P. Daly 

One of the authors [1] has recently established a connection between the 
method of maximum likelihood and shortest average confidence intervals for the 
case of one unknown parameter, and has reported a generalization [2] of this 
result for the case of several parameters. It is the object of this paper to consider 
the several-parameter problem in greater detail and at the same time to make the 
previously obtained result slightly stronger, particularly in the one-parameter 
case. 

Let X denote a set of random variables, and 6 a set of parameters fli , • • • , 0* . 
Suppose IIo is a population with the cumulative distribution function 'Fix, do) = 
Fo say. Then the logarithm of the likelihood associated with the population 
IIo" of random samples 0n’Xi,X2, • • • , a:„ drawn from IIo is 

n 

L"(a;, do) = Z) log dF(a:o, do). 

For a given sample On we shall say tha* a set of functions H^{x, d) is of class K 
if there exists a domain R of parameter points 9: (di , • • • , d*) in a d-space such 
that for each do in B: 

n 

(i) Ht{x, do) = H'^o is of the form ^ hi{xa , do) ; 

(ii) hi{x, do) = hto exists for all x except possibly for a set of zero probability; 

(iii) Po[/iio] = 0, where Eo means that the expected value is taken for the popula- 

tion Uo ; 

(iv) 11 iio[/i.o^)o] 11 exists and is non-singular; 

(v) the moments Eo[h,Qh,ohka] are all finite. 

(Here and throughout the remainder of the paper, the indices i, j, k, I have the 
range 1, ■ ■ • , A.) If, in addition, 

(iii') Eo[h,o] can be differentiated under the integral sign; 

(iv') the momenta Eo[h^ohja] are differentiable with respect to the d’s; 
the Ht will be said to be of class K'. 

We shall need the following lemma, which is very closely related to Theorem 1' 
and Theorem 2 in [1] and which can be proved by the method of characteristic 
functions. 


‘ Incorporated in this paper is a note presented by one of us (c.f. [2]) at a meeting of 
the Institute of Mathematical Statistics, December 27, 1938. 
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Lemma; Ld H^{x, 9) be of class Kfor each n, and put 

B:,o = -Eo[H?oHU = EolhM. 
n 

Let II &",o II be the positive definite matrix satisfying the equation 

If = 11 5”, oil 


and write 

iib%ir = Po””ii 

Then for any point 0o tn E the functions 

( 1 ) — Jibrmo 

■s/n 


computed from 11? have a joint distribution which converges in large samples to 
normality I with the density function 


Jl 

(2t) 


Now whenever we are justified in assuming a definite functional form for 
F{x, 6), and have a set of functions ipt{x, 9) whose distribution under this last 
assumption is known and is independent of the 0’s, as is the case in the limit for 
the functions (1), we can obtain, from a sample, information about the values of 
the 0’s. For, given any region >S in the space of the functions p, , we can deter- 
mine the probability Po i'P.o C .3} that in samples from Ho the point (ysio , • • • , pm) 
will fall in the region S, even though we do not know the population values 00 . 
Suppose, then, that we pick a region S such that Polv’to d S] > .95, and agree 
that each time we encounter such a problem we shall substitute the observed 
x’s into the p’s, and call the set of all points {0i , • • , 0/i) for which pfx, 6) C S 
the confidence region T. If this procedure is followed consistently, we can assert 
that the probability is more than .95 that the region T thus determined contains 
the true parameter point do . 

Evidently the size of the confidence region, i.e , the accuracy with which it 
serves to locate the true parameter point 0o , depends upon our choice of the 
auxiliary functions p, . Consider now the case in w-hich thorp is but one param- 
eter 0, and let p{x, 9) and p*{x, 6) be two functions with the same distribution 
D{u), where D{u) does not depend on 6 For the set 8 of the above discussion 
take the interval u < u < u. Then 

Po(po C S) = Pe(p^ C SI = a 

wliore a ~ 95, .say Given a .set of observed x’s, p(x, 9) will map S into a 
confidence region 7', uhilo p*(x, 0) will map it into a confidence region T*. 
Both T and T* may lie expected to contain the true value 0o in 95% of the cases; 
hence a reasonable w ay to compare the size of T with that of T* is to compare the 
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quantities ^ (x, 0o) and (x, 0 b) ; for these derivatives give an indication of the 


amount of change one can make in 0 without forcing or p* out of the interval S. 
The result obtained in [1] in this connection may now be stated as follows: 
dL 

Let H = be of class K', and let H* — ^ h(Xa , 6) be any other function of 

OU at““l 


class K'. Then in large samples from Ho both 


and 


H 


(nE [{I log if}]) 


H* 

(nE[{h(x,e)}])i 


are distributed almost normally with zero mean and unit variance But the 
confidence regions obtained from p will, on the average, be smaller than those 
from p*, in the sense that, for large samples the inequality 


( 2 ) 



will hold (unless hix, 0) = c — log dF, in which case alone the inequality (2) 

Ou 

becomes an equality). 

Now let us return to the several-parameter case. One method of attack which 
suggests itself is to consider the j acobian determinant 

dpto 

d0, 

for this bears the same relation to the area of the region dS which maps into the 
region 

dT i do — 2d0 <1 ^ ^0 d” ^dd 


as does the derivative 


dpo 

80 


in the one parameter case. 


To this cud, let us put 


dL^ 

L^x, 6) — - 7 ^- , and for each n and for each da in R assume that 
90, 

(a) L,"o is defined for all x except perhaps on a set of probability 0; 

(b) Eb[L:b] = 0; 

(c) Eo[Lto] can be differentiated under the integral sign; 

(d) II Eb[LibL7o] II exLst.s and is non-singular; 

(e) Eb[L”oL%] is differentiable in the 0’s 

Let H”{x, 6) be any other set of functions satisfying the same conditions Set 


Eb[H:bH;o] = nBZo 


Eo[L?bL;b] = nA:,B 
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and define the matrices 

II 1 




;0 


aS"ll-lU.Vir 


Ko I 


ij* = iiB?,oii iibni = iur,oir 

Now consider the normaliaed functions 

I?o = E «r'L”o 

^-1 


We then have 
(3) 


1 92/1*0 „ 9oo‘^ ^ r" -L ^ ^2/Jo 

n 99*: ddk ^ 39* 


and by virtue of assumptions (b) and (c) it follows that (e.f. [1], pp. 171-2) 

Eo\l ^“1 = -1 i: a?"21o[LroI/?o] 

Ln ddk J n ,~i 

In similar fashion 


Consequently 

(4) 


E, 




1 dSl, 

n 99* J 


Tl J>m\ 




and 

(5) 


(-1)^ 


Eq 


= \bIo\-*-]^Ieo[h^oL^o] 


Wc ran find a relation between these two determinants by going over to the 
matrix 


11 Bo[LroL”oi !i 11 e,[l:,h;o] i! 


This matrix is positive definite unless there is a linear relation with constant 
coefficients, say E + d,27,) = 0, which holds for all x’s except a set of zero 
probability; and in this event it is positive semidefinite. From the theory of 
compound matrices [3] we can then conclude that the matrix whose elements 
are the fi-th order minors of M,, arranged in lexicographic order on both row and 
column indices has the same property, so that 


\Eo[l:oL%]\.\Eo[h:oH^]\ 


> 1 Eo[L:,H%] 
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The relations (4) and (5) then imply that 


det jBoF-^“ 

> detBoFi—l 

\n 00* J 

In 00* J 


It may be observed that no use has been made of the assumption of linearity 
(i) in deriving (6). And since in the one parameter case the determinants have 
but one row and column, we see that in this case the result in [1] remainr valid 
for functions of an even more general type than those of class K'. In order to 
give the inequality a statistical meaning it seems necessary, however, to require 

not only that E and L satisfy (a), • • • (e) but also that in large samples 

vn 

and i Li tend to be distributed independently of d, with the same (though not 
yn 

necessarily normal) distribution. 

For the case of several parameters the transition from the above determinants 
of expected values to the jacobian determinants requires further argument and 
further assumptions. To begin with, suppose that the L? and Hi are of classiC', 
md that 

(vi) the momenta Eo F are all finite, 

L oOf ddi J 

with a corresponding condition on the variances and covariances of „ Jog 

o9i Off,- 

dF(x, flo). Let us put 



The characteristic function of the YT, is 

• ' * , ^hh) ~ [exp (f ^ 1 Y,y)] 

= j^exp 

Expanding the exponential in powers of the i’s and using (vi), we find that 

so that we have 

lim = 1 


uniformly in every finite interval } j < M. A basic theorem on sequences of 
characteristic functions [4] then guarantees that for any e > 0 


lim Po 


1 dH^o 
n ddj 



= 0 


n—*eo 
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1 dli^ 

that is to say, that - converges stochastically to its expected value. Under 

the assumptions of this paragraph the same type of reasoning may be used to 

11 1 

show that the quantities - ,-L^o, and - all converge stochastically to 

'Wf 7l« ut/j 

their respective mean values. It will then follow from equaticm (3) that the 

1 sZ/" “ “ 

functions - ■ — converge stochastically to the values Eo 
n 90, ■ 


'1 al.V 

_n dOj _ ‘ 


In fact, it 


can be shown [5] that any polynomial in these functions must converge stochasti- 
cally to the same polynomial in their expected values. Hence, given any 

e > 0, the probability that the determinant 1 


1 sZ/Jo 

n dd, 


differs in samples from 


fl bL" 1 1 

Ho from the determinant Eo ~ by more than e can be made arbitrarily 

Ouj _ I 


small by taking n sufficiently large. Similarly, the determinant 


converges stochastically to 


Bo 


bers e, e', we have the relation 


1 9iro 


"1 9H."o~| 
_n dd, J 


1 bh:, 

n d9j 

Thus, given any two positive num- 


1 dHto 

n dO, 


- > 1 - e' 


(where + indicates the absolute values of the determinants), provided n is 
sufficiently large 

As in the one parameter case, the restrictions which have been put on the class 
of functions L and H are not entirely necessary. But it is difficult to replace 
them by any other set of conditions which are not obviously ad hoc. Let us 
now summarize the above results. 

Theorem 1 . If the functions Lt and H" satisfy the conditions (a), • • • (e), and if 


(f) 


the functions - and - 

n dd, n 99, 


converge stochastically to their mean values; 


(g) the large sample distribution of the functions —-=^ 

■s/n 


L(*o fs the same as that of the 


functions H," and is independent of the 9q's ; 

then in large samples the confidence regions derived from the L’s will almost certainly 
he smaller than those derived from the H’s, in the sense that 


lim Pq 


1 9^0 
in 99, 


> 


1 9i?ro 

n dd' 


1 


unless there is linear dependence between the L’s and IPs. 

Theouem 2. The assumptions of Theorem 1 will be satisfied if the L, and //, 
are of class K', are linearly independent, and satisfy vi) 
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Theohem 3. For the case of only one unknown -parameter, the relation 


dLio 

‘ L 3^1 , 


' 9L"\ 

equality holding only in case H” = e can he derived 


under assumptions (a), 

... ,{e) alone. Its interpretation in terms of smallest average confidence intervals 
depends, however, on -whether or not (g) is satisfied. 

At first sight it may appear that the functions 

to which these theorems apply are too complicated to be of any practical use, 
involving as they do the square root of the inverse of the matrix 


I 5,”; II ^-\\E[H7H;]\\. 


But in employing the method of fiducial argument in the several parameter case 
there is no need to take the region S in the ^ space to be an interval 

. 

Instead, we may take S to be the interior of the sphere 
(7) jZ'pl <R‘ 


This enables us to avoid the computation of the 6"*^; for 

12 ill = - Z i Z 






71 ;, Aj ="1 


n 

k 


where j] 5"^*' || is the inverse of || 5"*, || . 

h 

To indicate more precisely how the function Z ’/'ni may be used to determine 

f=i 

confidence regions for the parameter point 6, we note that if the distribution 
law of the tends to the form 

then Z '/'ni, which is identically equal to - Z is approximately dis- 

1=1 n t,j 

tribiited according to the x law with h degrees of freedom, ITe then have 


( 8 ) 


p f- Z S’"’ mm < xi) 

\n / 
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approximately, where Xa is given by the relation 


1 

2r(^A) 


rxa * 

/ = cc. 

Jo 


The confidence region T corresponding to a particular sample 0„ : aii , xt, 

• • • , Xn consists of those points in the d space for which - X) < y* 

when the x’s are substituted in the H^s. Since the region T depends on the 
sample, it is essentially a random variable and the probability is a that T will 
include the point do , that is, the point in the 0-space corresponding to the values 
of the 0’s in the population, 

For example, suppose the population II is known to have the multinomial 
distribution law 

/(xo , ■ ■ ■ ,xi, ;po, ■ ■■ ,Pi) = Po“ ■ ■ Ph' 


In this case each x has but two possible values, 0 and 1, and 


(9) a:o -f- ■ • • -f- % = 1, Po + • • • + Pa = 1. 

The likelihood function for random samples 0„ drawn from n has for its logarithm 

A 

Z/" == 2 log Pc 


c-O 


where n, being the value of for the a-th observation. Because 


a*»sl 


of (9) there are only h independent parameters, say p, {i = 1, • ■ ■ , h). Thus 

nt no 


l: = 


Po 


and 


A" = ^ -h i 

A Po 

where S,, is unity if i — j and 0 if z j. It is not necessary to compute the 
a””, for, as we have seen, 




And one can immediately verify that 

A"*” = 0„p, - p,p, 


so that we have 


t*'.. ‘I ± (K-P. - pm) 

>=i n .,,=1 \pi Pa/ \pj Po/ 


(10) 
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Since in this case the L," satisfy the conditions of the lemma, we know that 
^ \j/l, is distributed, in large samples, approximately like with h degrees of 
freedom. 

As a matter of fact, (10) is precisely the Pearson x^ which is ordinarily used, 
in connection with the problem of deciding whether a sample supports the 
hypothesis that the population from which it has been drawn has specified values 
of the p’s. For, making use of the fact that 

K 

2 (n. — npd + (no — npo) = 0 
1-1 


we find that 

^ 

Po 

h 

SO that 'Pni reduces to 

t-l 




-^ij(n, np,) 



n 1,7-1 



— npvYjnp, 


which is the familiar form, Thus in particular the Pearson x'^ is the best fiducial 
function of its type which can be formed from H’s satisfying Theorem 1, in the 
sense that for sufficiently large samples its constituent functions L” will almost 
certainly have a greater jacobian with respect to the parameters p, than will 
the corresponding " computed from a set of ff," independent of the T," . 

The confidence regions determined by (8) when the are replaced by the 
LI* have an associated optimum property which may be stated as 


Theorem 4 ; Let Ao denote the differential o/ - 2 B””!-! H J* with respect to the 6 ,, 

It) 

evaluated at the true parameter point do , Let A* be the corresponding differential 
when the H" are replaced by the L” . Let the //,” and L^ satisfy conditions (i), 
(ii), ■ • • , (vi) and let the mean value of the product of two, three or. four factors 

taken from the set \h,o , \ he finite, no product containing more than two factors 

I bdkj 


of the type 

d log dFp 
hdi ' 


dh 1 

— jp^et similar assumptions hold for the set sl,o, 
dd, [ 

Then if n is sufficiently large 


ddi 


where 


( 11 ) 


LJo(|iAr) - > 0 


The equality m (11) will hold for all differential vectors if and only if each A,o is a 
linear function of the Lo ■ 
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This theorem can be proved in a straightforward manner by using the follow-, 
ing characteristic functions 


<pH 


tpL 


= exp (i 2 f Yj Uij 

= j^exp Y Uh,a + i 
= exp (i Y f 2 u„ 

\ .=1 i,i~i oOj / 

= j^exp 22 kh + ^ 52^ u„ J , 


where u^, = Uji . Now 

1 ^ dB”'^ 


Ao = - Y ~ h:h: de, + _ 

n 8 dk n 


Jt, nVf" 

ddk 


with a similar expression for A* . The problem of finding the mean values 
A§^ and A? ■) IS a matter of evaluating a set of fourth order deriva- 
tives of (fit and (fL at h = 0, Uij = 0. 

If the appropriate differentiations are carried out it is found that 


^^o(A^) = 4nr E 
Eo(aT) = 4nr Z 

L>,jAi 


BkmCuoCi,od9idd, 0 



Atjoddtddj -+- 0 


©: 


where 4i,o = ■Eo[hofio]i Buo = Bklhkafimli Citio Ea[hk<} , ho]- Denoting 

Eo ~ 

S = 4|Z M.,od9:d9, + O^i^l 

where || ikf,,o || = || ^„o — Y BfCktoCija H- If the hko and ho are linearly inde- 

k,l 

pendent then j] M^o jj is a positive definite matrix and hence Y ^<30 d9i dd, = i' 

ii3 

^say, will be non-negative and can vanish only when all d9i are zero. If each 
hko is a linear combination of the ho and if the hio are linearly independent, then 
each ho IS a linear combination of the hu . In this case it can be readily shown 
that every element in |1 M,,o || will vanish, and hence S' = 0. 

In ca.se some of the A,o are linearly dependent on the l,o , it can be shown that 
i.s positive semidefinite, that is, there exists no differential vector for which 5' 
is negative, although there will exist non-zero differential vectors for which 5' 
is zero. 
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It can be shown under the assumptions made in Theorem 4 that i (A*' - A?) 

n 

actually converges stochastically to 4a', and thus if the h,o and Z.o are linearly 

1 sKl 

independent, the difference ~ (Ao — Ao) converges stochastically to a positive 
number. Stated in another way: for sufficiently large samples, the square of 
the differential change in - X) "'■'Z/," for a given change d0, in the di from 

4,3 

1 V*' 

the values d,o, will almost certainly exceed that of - 2-) The sta- 

^ 4 ij 

tistical interpretation of this re.sult amounts to the following: by taking suffi- 
ciently large samples, we can make it as certain as we please that the confidence 

regions for locating do determined by using i in (8) will be smaller 

^ 4,3 

than those determined by using i {&). 

^ 1,3 
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ON SOME PROPERTIES OF MULTIDIMENSIONAL DISTRIBUTIONS 


By J. Lukomski 

If, in a system of random variables xi, xi, ... , Xn, some variables are con- 
nected by a functional (exact) dependence, the n-dimensional distribution 
law has a degenerated character. In other words, in this case the probability 
is not distributed over the whole n-dimensional space, but is concentrated on a 
manifold of a smaller number of dimensions which may be called the skeleton 
of ihe distribution. 

The character and the dimensionality of this manifold are determined by the 
character and the number of functional connections between the variables 
Xi,Xi, ,Xn. If all these connections are linear, the skeleton will be a linear 
manifold (hyperplane). The investigation of the skeleton of distribution 
represents obviously an interest from the theoretical as well as from the practical 
point of view. 

In the present paper we establish some criteria which enable us to determine, 
for any distribution possessing finite moments of the first and second order, the 
linear skeleton and to find the variations of the dimensionality of this manifold 
when the variables are subjected to a linear transformation.' 

We also apply the obtained results to the case of a multidimensional normal 
distnbution (generalized by H. Cramer to the case of linear dependence between 
variables). 

§1 

Let 

(1) Xl,Xi, ,Xn 

be a system of random variables defined in the n-dimensional euclidean space 
Rn by the multidimensional distnbution function F(xi , , ■ ■ , x„). The 

function F is defined on all Borel sets in . We assume the existence of the 
following moments: 


- j j f ■ 

E{XiX,) ~ f f " ‘ f x,Xjdd 


• dF(xi , xi , ■ ‘ ■ x„) = 0 


■ dF(xi , X!, ■■■ x„) = 




where the integrals are to be understood m the sense of Lebesgue-Radon 


‘The questions (i£ cleKeneracy of a statistieal distnbution ueic for the first time 
considered— from a somewhat different point of view — by R Frisch [1], 
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If the variables Xi , Xi , • • • , Xn are connected by a relation of the form CiXi + 
+ " • + C!,^Xn= 0 (SC’' 0) (are linearly dependent), we call this relation 

a limr bond of the distribution F. 

We shall call a system of linear bond of the distribution F complete, if all 
bonds of the system are linearly independent and every linear bond of the 
distribution depends linearly on the bonds of the system, 

By the (linear) decrement of the distribution F (we denote it by k(F) or simply 
If) vre understand the number of bonds in a complete system. We may, cor- 
respondingly, call the difference between the number of variables and the 
decrement of the distribution the (linear) rank of the distribution, or the dimen- 
sionality of the linear skeleton. 

The decrement (rank) is given by the following 

Theorem 1." The decrement (rank) of the distribution P is equal to the decre- 
ment’ {rank) of the matrix 

11 /^.ill t,jf = 1, 2, ••• n 
of the moments of the second order of this distribution] that is 

(2) kiF) = fc(ll p,, jj), i, j = 1, 2, . ■ • n 

Proof. Consider the form 

(3) V — ilXi + tiXt -f • • • + tnXn 

where U,ti , • • • , tn are arbitrary real numbers, not all equal to zero. Let 

= Eiv^) = J f ... iuxx -f hxi -1- • . • -h tnxf)^dd . . . 

(4) dF{xi,Xt, ... Xn) 
= ^ tit, I / • • • / XiXjdd . . • dF(xi, X 2 , • .. Xn) = 

t.)“l •' '' •'Sn 

Q“ is a non-negative quadratic form in the variables k, h , The 

system of values , ^ 2 , • • • , tn , for which the expression (3) becomes zero is a 
double point of the form 

The coordinates of the double point can be found from the system of homo- 
geneous equations: 

Mlltl + iilltl -|- • • • -f- Plrdn — 0 

(5) Ulik + ^22^2 + • • • + Plntn = 0 


Pnlk -p PnlU. -f- • • • "h Pnntn — 0- 


’ This theorem was proved by a different method by K Fiisoh [1], 

’By the decrement of a (rectangular) matrix we call, after B Kagan, the difference 
between the number of its rows and its rank. 
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It is, however, known that the number of the independent double points of the 
form, Q\ i.e the number of linearly independent untrivial solutions of the 
system (5) is equal to the decrement of the matrix i| || , i,j = 2, . ■ . n. 

Consequently, there exist only fc(ll 11) independent linear connections 
between the variables xi , aia , • • ■ , a:„ , which proves the theorem. 

Hence it follows that the variables Xi, xt, ■ • • , are linearly independent 
(fc(F) = 0) if and only if the form is positive definite and, consequently, the 
discriminant ] /it; 1 of the form is positive. 

The following two theorems may be used for determination of a complete 
system of linear bonds. The first of them is a special case of the second, but is 
stated separately in order to simplify the proof. 

Theorem 2. If k(F) = 1, we obtain the linear bond of the distribution by re- 
placing in the determinant on the left hand side of the equation 



Proof. Since the decrement of the matrix H H, i, j = 1, 2, ■ • • n is equal 
to 1, for the unique nontrivial independent solution of the system (5) (h, 
< 2 , • , tn) may be taken, as we know, the system of algebraical supplements 

of the elements of any row of the determinant | /t,-,- 1, z, j = 1, 2, • • • n. (Among 
the algebraical supplements of elements of each row there is at least one different 
from zero, since the algebraical supplements of corresponding elements of any 
pair of rows are proportional to each other.) 

Hence, since kxi + + • ■ • + tnX„ — 0, the theorem follows. 

Theorem 3. If k{F) > 0, we obtain a complete system of linear bonds of the 
distribution F replacing in each of the k equations 


Mil 

Pk,k+\ 

• ■ Min 





Mi+l.i+l 

■ ' ' 

= 0, 

i = 1,2,. 

■ ■k 

Mrt* 


^nn 





one (arbitrary) row of the determinant respectively by Xi , , ■ ■ , , where 

X/t+i , ■■■, x„ are chosen in such a way that 

Uh+lMl ' ' • gr+i,n 

> 0 . 
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Replacing, for example, the first rows, we obtain : 


Xi 



Xn 


Mk+i.i 

IJ'k+hk+l 



= 0 

Mnl 

Mn,k+l 


l^nn 


Xa 



Xn 





hik+l,n 

= 0 

Mni 

Mn,fc+I 


Mnn 


Xk 



X-n 



l^k+l,k+l 


fJ'k+l.n 

= 0. 


Mntfc+l 


Mnn 



Proof, The theorem is already proved for k{F) = 1 (Theorem 2) We have 
to prove it for k(F) > 1. 

Let us in the first place show that the matrix || i^c, ||, z, i = 1, 2, • • ■ w pos- 
sesses at least one positive chief algebraical supplement of the order n — k. 

In fact, in the system of n variables xi , Xi , ■ ■ ■ , Xn , connected by k inde- 
pendent linear relations there must exist a subsystem of n — fc linearly inde- 
pendent variables. Let these variables be Xi+i , x^+s , ■ ■ ■ , Xn. The deter- 
minant of the moments of the second order of this subsystem: | |, i, j ~ 

|{ + 1, . . • , n is different from zero and, by the property of Gramm’s determi- 
nants, is positive. Further, each of the subsystems Xi , xi+i , • ■ • , x„ , is sub- 
jected to the distribution law F,(a;, , Xk+i , ■ • • , Xn) with the decrement fc, = 1 
and, consequently, by Theorem 2, the relations (9) are satisfied. (Arguing 
as before we find that any (not necessarily the first) row in each of the deter- 
minants in (8) may be replaced by Xi , Xk+i , • • • a;„). 

In order to show the independence of the relations (9), write the system (9) 
in the form: 


O') Z c.,x, = 0, 

and consider the matrix of its coefficients : 


i = 1, 2, 


C7u 

0 • 

• 0 

C'lifc+i 

Ci,k+a ■ 

• Cxn 

0 

Ci2 ■ 

. 0 

C2,k+1 

C2,l:^2 • 

■ Can 

0 

0 •• 

• Ckk 

CkM-l 

Ck,k+i • 

• Ckn 


The matrices (10) have the rank fc, since the determinant of order fc 


Cxx 

0 •• 

• 0 



0 

Cai • • 

• 0 

— ClfCsi • 

• Ckk 

0 

0 

• Ckk 
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belonging to the matrix, is positive; this follows from 

Cn = Cn = • • • = Ckit = I fi,, 1 > 0, >7 ~ ^ "b 1, • ' • , n, 

Thus the independence of the relations (9) is proved and the theorem is 
established. 


§2 

In this section we consider the question of the variation of decrement of the 
distribution in the case when the variables are subjected to a linear trans- 
formation, 

Let Xt, Xi, • • • , a:„ be a system (1) of random variables and 


Ui = ttiiXi -b atiXt + • • • + 


( 11 ) 


Itj = OiiXi -f- OiiXi + • • • + Cl 2 nXn 


V/m — -b “b • * • *b U-mnXn 


a system of linear forms in the variables (1). 

The distribution function of the variables Ui , Ui , ■ ■ , we denote by Fi , 
the decrement of the distribution by k(Fi), or, shorter, by ki . 

The two systems of equations (11) and (9) form together the system: 

Ui = UtiXi + ClitX2 + • • • + ClinXn 


( 12 ) 


Mm = + <im2X2 + • • • + Clmn^n 

0 = Om+l.ia:! + am+l,i3^ + • • • + 


0 = am+t, + (lm+i,2X2 + • • • + am+i, nXn 


where the last k equations represent, in new notation, the equations (9). 

We call the matrix of the coefficients of the variables in the system (12): 
II fl„ 11 f = 1, 2, ■ ■ , m + k;J = 1, 2, ■ ’ ,n, the elongated matrix of the trans- 
formation. 

We prove the following 

Theorem 4. The decrement of the distribution Fi(ui, m, • • • Wm) is equal to the 
decrement of the elongated matrix of the transformation. 


(13) 


k{Ff)=k(\\a„\\). 


i = 1, 2, ■ • • , w + 
i = 1, 2, ■ • . , n 



MULTIDIMENSIONAL DISTRIBUTIONS 


241 


Proof. Consider a system of forms in arbitrary linearly independent param- 
eters b ‘ • 


j;i == Cn^i 

+ 

+ • ‘ 

• + WlnSn 

V2 = Ojifi 

4" <^^2 

-f . . 

• 4 Oin^n 


4" Om2^2 

4 •• 

■ 4" OmnJn 

Vm+l — Um+l.lSl + fflm+l.sfa 4* ‘ • 

* H" Om-(-l,n^n 


Vm+h ~ am+J:,l|l + + • • • + Om+i,n?7i 


such that the matrix of the system (14) coincides with the elongated matrix of 
the transformation. 

For 

(16) . Vm+I ” Oj • * * » “ 0 

the system (14) reduces to the system (12). 

If the decrement of the matrix of the system is equal to s, there exist only 
m + k - s linearly independent forms v, , and each of the remaining s forms is 
a linear combination of the first. 

By Steinitz's theorem we can always include in a subsystem of independent 
forms the forms Vm+i , • ■ • , Vm+* (since these forms are independent). 

Denoting all forms of the subsystem by tij+i , ■ ■ ■ ,Vm , Vm+i , ■ - ■ , t'm+a , let 
us write the s relations connecting each of the remaining forms with the forms 
of our subsystem in the form: 


ffllUj + fflii+lf.+l + ‘ + 9lmVm + ffl.m+lVm+l + • • • + gi,m+hVtn+L — 0 

(16) "I" S2,t+l^t+i + • ■ • + + ff2,m+lVit+l + ■ • • + ffU.m+kVm-t-k = 0 


-f- . . . -f- gtmVm + P.,m+lWm+l + ' + g»,m+kVm-\k — 0 


where gn , ga , ■ • • , g„ ^ 0. 

Assigning to the variables in these equations the values (16) we clearly 
obtain s linear relations between the variables Ui , Ui , • ■ • , Um 

gn'Ul + gi,a+l'U,+l -(-•••+ PlmWin = 0 
(17) g 22 Ui + g 2 ,a+lUs+l + • • ■ 4" 92 mUm = 0 

ffsaU, -f- + • ■ + fi'im'Wm = 0- 

The equations (17) are linearly independent, since the matrix of the sy.s- 
tem (17) 
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Cfii 

0 ■ 

• 0 

Pl,f+1 • 

’ §lm 

0 

ffas • 

• 0 

02,*+l ■ 

' 92m 

0 

0 • 

* Q** 

^«,*+l ' 

* 9»m 


has the rank s; this follows from the fact that it contains the determinant 


Qn 

0 

0 .. 

<722 • • 

- 0 

• 0 

' = gn-gn ■ ■ 


0 

0 •• 

• g,. 




of the order s, which is different from zero. 

We proceed now to prove that there exists no other linear relation between 
the variables m, linearly independent of the relations (17) 

From the equations (17) the variables ■ ■ ■ , m. may be determined as 

linear combinations of the variables u,+i , • • • , Wm (we suppose that m > s, 
since for m = s the proposition under consideration is trivial) . 

It is thus to be proved that the variables u,+i , ■ ■ • , Um are linearly inde- 
pendent (since every new linear relation between the variables Ui,ut , 
independent of (17) must, after corresponding substitutions, lead to a linear 
relation between Wj+i , • • • , Um). 

In the equations (12) the linearly independent variables w,+i , ■ • • , Wm+it are 
linear forms in n linearly independent parameters , $ 2 , • • • , $« • 

We may instead of the , ^ 2 , - • • , take for the system of linearly inde- 
pendent parameters Vm+i , • ■ ■ , Vm+h , $*+i , • • • , Jn (changing the indices of the 
? in an appropriate manner), defining by the system of 

equations 


Vm+t — Clm-rl.l^l + • ' • H" Um+l.n^n 


Vm+k = Um+l,.!?! + • • • + dm+k.n^n 

which is always possible, since the forms , • • • , v^+k are independent. 

Substituting the expressions obtained for the , ^2 , • • ■ , into the forms 
Wi+i , • ■ • , Wm , we find 

ri+l = (0«+l(l»nH-l ) • • Vra+lb) + p»+l{^k+l ,••• in) 

(18) 

Vm = V’mfWw+l , • ■ • Vm+k) + 'f'miik+l , ■■ in) 

where ip and are linear forms in the corresponding arguments. 

The variables v,+i , ■■ ,Vm remain, of course, independent. 


* The indices of the £ adequately chosen 
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Performing in the equations (18) the substitution (15), we obtain; 

U,+l = , • • • ^n) 

(19) 

4'm{^k+l I • ■ ■ $n)- 

If there exists a linear dependence between the • , u™, we can find 

a,+i , ' • • 1 equal to zero, such that 

(20) a^+lU>+l + • • • + <XmUm = 0. 

Multiplying the equations (18) by the coefficients a.+i , • , respectively, 
and adding, we obtain, by virtue of (19) and (20) 

+ ' • ' + OlmVm ~ as+lipj+l(Vm-H, • • ■ Vm+l) + . . • -f- am<Pm(Vm+l, ■ ;■ Vm+k) 


, am respectively, 


i,e, the variables 2 / 84 - 1 , ■ • , Vm+k are linearly dependent, which contradicts the 
assumption. 

The required proposition is thus proved. 

It follows that the s equations (17) form a complete system of bonds of the 
distribution Fi , which proves our theorem. 

The moments of the second order of the distribution are connected with 
the moments of the distribution F by the following formulae 


= E{u,u,) = E (2 atva/rV X! ) 


ClirCljs^(,XrX^ j “ ^ ' ^)* 

r,««l 


§3 

Let the normal law of distribution G (generalized by H. Cramdr) be given 
by its multidimensional characteristic function [2], [3]: 

fih, k, ■■■Q = f f ■ - f dd ■ . • dG(x:, • /c„) 

(22) •' •’ 

-J02 

= e ’ 

n 

where (f = X Cr,trta {Cri = Cjr) is a non-negative quadratic form in the real 

r,a<=l 

variables k, h , • • > , t„ . (The integrals, as above, to be understood in the 
sense of Lebesgue-Radon.) 

As is easily seen, the coefficients c,, are the moments of the second order of 
the distribution 0 for which 



If is positive definite, we have a proper normal distribution. 

If is non-negative, the distribution G possesses a positive decrement. 
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The decrement and the linear bonds of the distribution may be determined 
from the matrix of the coefficients [j Cr. H r, s = 1, 2, • • • , n on ground of the 
general theorems of §1, 

Let, as before, 

Wl = OiiaJl + ai^ + • ■ • + ClinXn 
1 V Ut = chiXi + OaXi + • • • + <ht^n 


Um = OmiXi + UmiXi o„„a:„ 


be a system of linear forms in the variables Xi , X 2 , • ■ • , Xn . We shall prove 
the following 

Theorem 5. The variables ui, ut, ■■ ■ ,Um are subject to the generalized 
normal distribution law the decrement of which is equal to the decrement of the 
elongated matrix of the transformation 


Oil 

an • • * 

Oln 

Oml 

Omi • • • 



flm+l.S • • • 



' • • 



Proof, Consider the characteristic function of the distribution 

Rj ) ■ • ' Um)i 

(23) fxiU, h,--.tj = Jf f cfcf . . . dGi(ui,U 2 • . • ,u„). 

Performing in this expression the substitution (11), we obtain 

fl(tl, ii, •• • tm) 


(24) 


r n n n 

• • • dCJ 2 ana:,, 2 a.'/*;. • • • S a„,x, 

[f-i 1-1 1-1 

= I j ... j “'■"‘0 dd 

■ > • dGiXi, Xi, • • • Xn) 


(d d ■ ■ • dG{xi , Xi ,■■■ x„) in the expression (24) does not, in general, coincide 
with dd ■ ■■ dG{xi , a:j , • • • a:„) in the expression (22)). 

Taking into account (22), we obtain 
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where 


( 26 ) 


Ql SCra f ®2>r^P / \ ^ fltga^g J 

r,a»l (, \p-l / Xd-I / 


>Cra ripr^q^lpt^ 

ria»“l p.C™! 


Otpr OEfla r — ^'i iptqV- 


p.i—l i, r,a-l 


p, 5 =l 


(2i is a non-negative quadratic form in /i , U, •• ■ ,tn , the coefficients of 
which coincide with the moments of the second order of the distribution 

(ri(Ui , Us ,•• • Uni)- 

Consequently, the distribution Gi is a generalized normal distribution. 

By Theorem 4 the decrement of the distribution Gi is equal to the decrement 
of the matrix H Upr H p = 1, 2, • • • , m -f lb; r — 1, 2, • . • , n, the last k rows 
of which consist of the coefficients of the complete system of linear bonds of 
the distribution G. 

Let now xi , jca , • • • , a:« be a system of random variables subjected to a 
proper Gaussian law. The density function of the distribution of the system is 




1 

(27r)’ II \/Jhi "v/S 



i }>* 


— ElS— \ 
ViuiM. 


where 



1 

ri2 

• •• rin 

5! = 

Tji 

1 

rsn. 

1 

r„i 

7*2n 

... 1 


jK„ are the algebraical supplements in R, rj, = — , and Is a positive 

V Mii 

definite quadratic form in the variables Xi, X 2 , ■ ■ , a:„ . 

Again let Ui , mj , • ■ • , Wm be a system of linear forms in the variables 

Xi , 3^2 , • • ‘ , Xn 


(11) 


Ui = oiiXi -b onxs + • ■ ■ + ainXn 
U2 — OilXi -|- 022 X 2 • • • + ChnXn 


Um = (TmlXl “b Om2X2 "b ' ' ’ + 


Then from Theorem 5 follows the 
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CoHOLi/AHy. The random mriaUes ui , Ui , ■ ■ ■ , are subject to the m- 
dimensional properly normal distribution law of Gauss if and only if the matrix 


On 

• • • 

Ol„ 

021 

On ■ • • 



Omi ■ • • 

^mtt 


of the system of forms (11) has the rank m. 
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ON A CUSS OP DISTRIBUTIONS THAT APPROACH THE NORMAL 
DISTRIBUTION PUNCTION' 


By Geobgb5 B. Dantzig 


1. Fonnulation of the Problem. An important property of a sequence of 
binomial coefficients is that, when suitably normalized and transformed, it con- 
verges to the normal distribution.® The object of this paper is to exhibit a 
large class of other sequences which also possess this property. 

The Pascal recurrence formula may be taken as the defining property of the 
binomial coefficients. Let the combination of n things taken a: at a time be 

denoted by If we set fn(x) ~ for 0 < a: < n and/„(x) = 0 for 

I < 0 or a; > n, then fn(x) is defined for all integers x. With this notation 


Pascal's recurrence formula, 


n 


1 


+ 


n — 1 
X — 1 


, may be written 


(1) fn(x) = ^ - 1)] , 

where this new form is valid for all integers x extending from - » to + «> . 

In order to generalize, we may consider a sequence of distributions /i(x), 
• • • each defined in terms of the preceding one by means of the 
recurrence formula 


(2) fnix) = — [/n-i(a: — 0) -hfn-iix — 1) — 2) • +/n_i(a: — o„)], 

And" t 

where the x are integers, and a„ is a positive integer which may change in value 
from one distribution to the next. The problem is to find conditions under 
which fn{x), in normalized form, approaches the normal distribution. The 
normalization of fn{x) is effected by the affine transformation 

( 3 ) u = vn(u)=Mx), 

(Tn 


'Presented November 21, 1938 before a joint meeting of the Columbia Mathematics 
Club and the Statistical Seminar of the Graduate School of the Department of Agriculture; 
also December 10, 1938 before a meeting of the American Mathematical Association at the 
University of Maryland 

’ Due to DeMoivre, 1731 By a variable distribution approaching the normal dis- 
tribution, we mean that the integral under the variable distribution between any two 
limits approaches the corresponding integral under the normal curve 
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where 5n and ffn are the mean and standard deviation of the distribution fn{x). 
The normal (cumulative) distribution function is taken in the standard form 

(4) <p{u) = f dx . 

The theorem whose proof forms the theme of this paper may be stated as 
follows: 

Theorem: A necessary and sufficient condition that <pn{u) — » <p(u) as n k 
is that r = 0, where 

(6) r = Lim / (^ y?\ ; 4y. = a! + 2ot . 

n->M 1=2 / \*=2 / 


2. Liapounofi Condition; the general case. The recurrence formula (2) is a 
special case of the most general linear recurrence formula 

+00 

(6) /n(:r) = 2 fi^n(^)/n-l(® i)t 


where g„ii) are a given set of weight functions generating the sequence fi{x), 
■ • • )/"(*)> • • ■ • We may form the recurrence formula (2) by setting 


1 


( 7 ) 


'■“■a. + l 

gnii) = 0 


if 0 < f < On , 

if f < 0 or ^ > On . 


Let F*(<) == S fkix) express® the probability that a variable Xk < t, where the 

*<t 


distribution function of Xk is defined as and in a similar manner let the 
probability that a variable < < be given by Gk{t) == ^ gk{x) By summing 

i<( 

i„{x) for all x less than t, we obtain 


(8) F„({) = ,£ - i)g.{i) = Fn-i(t - i) d(?n(f) 

»■“— OO V— go 


where we have replaced the summation by a Stieltjes Integral. In the latter 
form the integral gives, in general, the probability that the sum of two inde- 
pendent variables x„-i and s„ is less than t From the above equation we see 
that the probability that ain-i -|- s„ < < is the same as that of a:„ < t, so that 
we may set Xn = x„_i -f s„ . By iteration one obtains 

(9) = Si -{- Sj -f- s„ 

for all n. Thus we have established that if a distribution function of a variable 
Sk is defined as gk{x), then the distribution function of the sum Si -f- sa -f • • Sn = 

Xn isfn{pe). 


* The summation extends over all values v less than 1. 
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Xhe limit of the distribution function of the sum of n independent variables 
(jg ft 00 has been considered by Laplace, Liapounoff, Lindeberg, and others. 
liVe shall make use of a sufficient condition given by Liapounoff that the nor- 
malized distribution function of a;„ approaches ^(u). 

Laplace-Liapounoff Theorem:* A sufficient condition for the normalized 
distribution function of the sum of n independent variables Si , s^, ■ ■ , s„ to 
approach the normal distribution function with increasing n is P' = 0, where 


( 10 ) 


r' = T im + M«(2) -b ■ . ■ H- M,{n) 

[M2(1) + Mi(2) -h • • • + ’ 


and where Jkfs(fc) and Mi(k) are defined as the second and fourth moments of 
whose distribution is gk{x). 

Thus we have shown that if a sequence of distributions /« (a:) is defined by the 
general linear recurrence formula (6), 

+« 

fn(re) = Z) Qnii) -fn-l(x - i) , 
oo 


then a sufficient condition that <pn(u) — » <f>{u) as n — > «> is given by P' = 0, 
where <p„{u) is the normalized form of fn(u). 


3. Sufficiency of the Condition P = 0. We may simplify the condition 
r' = 0 for the more restricted case of a sequence of distributions defined by the 
recurrence formula (2). In general, the second and fourth moments of £/„(a:) 
are given by 


( 11 ) 




Mi{k) == Z 9kix)ix - §0% 


4-O0 


Miik) = Z gkix){x - s*)*. 


where s* is the mean value of the distribution. Equations (7) give the special 
values oifkix); substituting these values in (11), and remembering the Bernoulli 
summation by which 1” -f 2*’ -f 3” -f- • • -f n’’ may be expressed as a poly- 
nomial in n of degree p + 1, we obtain 


Miik) = Z 


-0 u* 


™ M,m . 

i-O ttjt -T 




dk + 2ajk j 1 


3 ’ 


1 

[al + 2akY 1 

al + 2a* 

6 

L 4 J 16 

_ '”4 “‘ . 


i 2 


15 


yk 


^ J. V. Uspensky, Tntroduclion to Mathematical Probability (^tlcGrnw-Hill, 1037), pngea 
284-202; the theorem is proved there by the method of characteristic functions. 
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whence by substitution in (10), P' becomes 

(13) 


r' = Lim 


^ Z O'! - ^ £ 7. + M,{1) 
• 5 1=2 15 t«2 


[s§- 


+ M2(1) 


Since Oj > 1, 7,- > 3/4, and thus Z 7. ^ as n -* we may reduce P' in 

the limit to 


t-2 


(14) 


Q n / r “ “I 2 

P' = - Lim Z 7? / Z 7i • 

O tt-»oo i“'2 / _ 


Since P' = fP, the Liapounoff condition P' = 0 for normality becomes 
by (5), r = 0. 


4, Necessity of the Condition P = 0. A necessary condition for normality 
can be found by noting that if «5 „(m) approaches <f>(u), then the moments of 
must approach the corresponding moments of (piu).^ Letting m{n) be 
the 4th moment of v’n(w) and in the corresponding moment of the normal curve, 
a necessary condition is that — > /xj as 22. — » w, and ,U4 = 3. The 4th 

moment of <pn(u) may be expressed simply in terms of the moment of 
If the symbol E stands for expected value, the second and fourth moments of 
fn{x) are E(xn — XnY and E{xn — respectively, and the relationship is then 


(15) 


min) = 


EiXn ^n) 
[EiXn - Xn)T 


E 


E 


^ ^ (Sn Srt) 
i-l 


!Z (^n Sn) 


Expanding the sums by the multinomial theorem and taking the expected value 
of each term we obtain 


(16) E{xn - a:„)“ = ZE(s. - §.)^ + 2 Z S{s. - s.)E(s, - s,) = Z M,ii), 

»*='i *<J=I t = l 

where M2(i) is the second moment of g^ix) In a similar manner we have 

Eixn - ,E„)' = Z Miit) + 6 Z M,{z)Miij) 


(17) 


i<!-l 


Z M,(t) + 3 Z M,{i) 


3 Z Ml ({) ; 


® Uspensky, loc, oit., pages 383-388. 
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whence 


i: M,{%) - 3 S Ml{^) 

(18) nin) = 3 + -j^ 

Since a necessary condition for normality is that Lim m{n) -* m = 3, the 
fraction in the above equation must in the limit approach zero. Substituting 
Miii) = Miii) = syl — -^y^ , we find that this ratio reduces imme- 

diately in the limit to the condition T = 0. 

5, Application to the Distribution of Inversions. A frequency table may be 
set up for the number of permutations of n objects that give rise to a fixed 
number of inversions. Three objects marked 1, 2, 3 may be permuted in 
6 ways: 



(123), (132), (213), (231), (312), (321). 

If (123) is taken as standard position, the number of inversions associated 
with the above set to bring each one into standard position are respectively 
0, 1, 1, 2, 2, 3. Thus we pass from (321) to (123) by the following three inver- 
sions or adjacent interchanges: (312), (132), (123). Among the six permuta- 
tions there is one giving rise to 0 inversions, two having 1 inversion, two having 
2 inversions, and one having 3 inversions 
The distribution of inversions finds its application in a test of significance. 
The standard position is taken as a hypothesis of rank order, and the difference 
between an observed set of ranks and the hypothetical one is measured by the 
number of inversions. The distribution may then be used for finding the 
probability of obtaining by chance the number of inversions found, or less. 
For a moderate number of ranks (six or more), the distribution of inversions may 
be approximated by a normal curve. We shall show that as the number of 
ranks is increased, the normalized distribution of inversions approaches the 
normal distribution. The distribution of inversions of 1, 2, 3, 4, objects will 
be found in the table below. 


Inversions: x 

0 

1 

2 

3 

4 

5 

6 

i-Mx) 

1 







l-2-Mx) 

1 

1 






1-2 3./3(a:) 

1 

2 

2 

1 




l-2.3-4/4(a:) 

1 

3 

5 

6 

5 

3 

1 
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By induction one may show that the following relationships hold between 
successive distributions: 

Mx) = - 0 ) +Mx - 1 )], 

Mx) = M/ 2 (^ - 0) +Mx - 1) +Mx - 2)], 

(19) : 

fuix) = -Un-i(x - 0) +/„-i(a: - 1) 

+ fn-2(x — 2) + • • • + fn~2(x — W + 1)], 

Since this satisfies the basic recurrence formula (2), where a„ = n. — 1, we may 
find out whether the normalized distributions of inversions approaches (p{u), 

n / r ” 

With yn = ii‘ — 1 the condition r = 0 becomes Lim 2 1) / 12 (*“ ~ 1) . 

The numerator sums to a polynomial of the 5th degree in n, while the brackets 
of the denominator sums to a 3d degree polynomial, which after squaring is of 
the 6th degree ; so that as n — > “ w'e have in the limit r = 0. Thus the nor- 
malized distribution function of the inversions of n objects approaches (o(m) 
as n — > oo . 

Equations (12) and (16) permit us to find the mean and standard deviation 
of the distribution of the inversions of n objects; 


( 20 ) 


x„ = in(n - 1), 

O'" = - l)(2w + 5)- 


The sequence of binomial coefficients, and the distribution of inversions are 
examples of sequences that satisfy recurrence relation (2); it should be noted 
that their respective values of 7 „ , ( 7 „ = 3/4 or jn = — 1), may be considered 

as bounded between two polynomials of the same degree in n. Whenever this 
is true the condition r = 0 will hold and <Pn{u) will approach On the 

other hand, if for example, 7 „ = r”, then T ^ 0 and does not approach (p{u). 


6. Smoothing Formulas. The general recurrence formula (6), 

Mx) = 12 gn{i)U~l{x - i), 

i =»— 00 

may be considered as a linear smoothing formula. For example, we may obtain 
the usual three point smoothing formula based on binomial coefficients for 
smoothing a distribution /i(a') into/ 2 (j) by setting in the above equation n = 2, 

1 / 2(0 ~ ^ ~ 1 < f < + 1 , and gi^i) = 0fori< — lori> +1. Thus 

(21) h{x) = + 1) + 2Mx) -f Mx - 1)1. 
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From considerations found in Section 2, we see that if a variable Zi has for 
distribution fi{x) and a variable Si has for distribution gi{x), then their sum 
-j- has for distribution function the smoothed distribution fi{x). From 
this point of view, the smoothed distribution fi{x), obtained by applying a 
linear smoothing formula, is a “cross” between the original unsmoothed distri- 
bution /i(a:) and the artificial weight distribution g^ix). 

Often a smoothing formula is used several times; first on the original distri- 
bution, then on the smoothed distribution, and then sometimes on the smoothed 
smoothed distribution. If a linear smoothing formula is thus iterated 1, 2, 
j n, • • • times, the sequence of smoothed distr%hutions obtained upon nor- 
mlwation approaches <p{u). This may easily be demonstrated by showing that 
Liapounoff’s condition for normality, V = 0, is satisfied. Since in this case 
the weight distribution gn{i) is the same for all n > 2, the corresponding moments 
of these distributions must all be equal; thus we may write Mi{n) = Mi{2) 
and Mi{n) = Mi{2) where n > 2. Substituting in (10), we obtain for T' 


( 22 ) 


Mid) -h (n - mud) 

„ ® [M^d) + (n - 1 )M 2 ( 2 )]“’ 


where ¥ 2 ( 1 ) and Mid) ^^6 2d and 4th moments of the unsmoothed distri- 
bution /i(a:), The mean value x„ and the standard deviation (r„ of the distribu- 
tion /« (a:) formed by iterating a smoothing formula u - 1 times are easily shown 
to be 


+ (W - l)Su, , 
an = (Ti + {n - IjvJw , 

where xi and xi are the mean and standard deviations of the original unsmoothed 
distribution, and where and xm, are the mean and standard deviation of the 
weight distribution g^d ) , 

The linear smoothing formula is used in practical work to smooth data 
Successive application of one or many such linear formulas will usually smooth 
any set of values to the normal curve of error The above section serves as a 
warning of what is introduced by the use of such methods 
It is a pleasure to acknowledge the helpful criticisms and advice of Dr W E. 
Demmg m the preparation of the manuscript. 


Washington, D C. 



THE LENGTH OF THE CYCLES WHICH RESULT FROM THE 
GRADUATION OF CHANCE ELEMENTS 

By Edwakd L. Dodd 

1. Introduction. Eugen Slutzky* found that under certain conditions 
repeated summations of chance elements lead to a sinusoidal configuration. 
Generalizations were naade by Y. Romanovsky.^ A more recent paper by 
Slutzky^ has appeared, summarizing his original Russian memoir, and making 
extensions. Contributions to this subject have also been made by H. E. Jones,* 
E. J. Moulton,' and A. Wald.' 

Readers who wish to get into touch with recent literature on periodicity are 
referred to two excellent books, that of Karl Stumpff’ with bibliography of 319 
references, and that of Herman Wold,® with bibliography of nearly 70 references. 

In this paper, I deal with the wavy configuration resulting from a dngk 
application of a specified graduation formula. For this purpose, only linear 
operators are considered. For actual graduation it is customary to require 
that the sum of the coefficients or "weights” be equal to unity. But for the 
present purpose, this requirement is irrelevant. For example, summing and 
averaging are here essentially identical. The graduation formula considered 
may or may not be the combination of simple summations or averages. Indeed, 
formulas preferred by actuaries and statisticians include terms with negative 
coefficients; and thus involve an operation other than addition. F. R. Mac- 
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itself does not include reference to repetitions, mentioned by Moulton and Wald 
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sula/ gives a chart of 24 weight diagrams. Of these only the first four are 
without negative coefficients. 

Of course, the “waves” produced are irregular, and the difficulty of defining a 
cycle-length confronts us. The apparently na'ive definition of a cycle-length as 
the average distance between successive maxima (or minima) is, I believe, worth 
consideration as a rough first approximation of the cycle length for graduated 
values delivered by formulas with negative coefficients or by those involving at 
least triple summations. But the cycle length thus determined is somewhat too 
short; for, slight undulations will occur—Slutzky calls them “ripples”— -which 
should be eliminated if we want a cycle-length intuitionally reasonable. On the 
other hand, the cycle-length defined as the average distance between alternate 
intersections of the graduated curve with the base line is likely to be decidedly 
too long,— as illustrated by Slutzky’s Figure 2 (loc. cit., p. 109) which exhibits 
1,000 graduated items, with 41 marked maxima and 41 marked minima — after 
dimination of what he considers ripples — but with only 23 up-crossings and 
23 down-crossings of the base line. I indicate in what follows an analytic 
method for removing ripples. And I describe several methods for obtaining a 
number which might be called a cycle-length. Often these seem to cluster about a 
central value, which appears to me to be a reasonable estimate of the "length 
of the cycle" created by the specified graduation formula. 

The theory to be presented here assumes that the chance elements are normally 
distributed about zero with constant variance. But the data used by Slutzky 
came from lottery drawings, with a “rectangular” distribution; and for checking 
I have likewise used rectangular d-istnbutions} mainly, three sets of 600 numbers 
each, taken from the tenth figures of logarithms in the Vega Tables. It is 
known, however, that the average of a few elements distributed rectangularly 
is nearly normal. From many tests that I have made, it would seem that 
rectangular distributions react as if normal. To illustrate; When normal data 
are given a twelve-fold summation or averaging by twos, the probabilities that 
at a specified point there will be an upcrossing of the base line, a maximum, or air 
inflection from concave to convex are respectively, 0.0628, 0.106, and 0.134. 
These numbers multiplied by 100 give 6.28, 10.6, and 13.4, as the expected 
number of occurrences per hundred graduated values. Slutzky exhibits in 
Figure 4 (loc. cit., p. Ill) 100 ordinates as the result of applying to lottery 
drawings 12-fold summation by twos. The figure shows 6 or 7 up-crossings, 
ten maxima, and 13 or 14 such inflections — in close agreement with expectations 
based upon normal distributions. 

2. Derivation of Probabilities and Comparison of Actual with Expected 
Occurrences. A “cycle length” is first conceived of as the reciprocal of a relative 
frequency or probability. Thus, if the probability that a graduated value will 


’ F E, Macaulay, The Smoothing of Time Series Publications of the National Bureau of 
Economic Research, incorporated, No. 19 (1931). See pp 77-79. 
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be a maximum is 0.05, we expect 5 maxima per hundred graduated values 
making the “cycle length” for maxima equal to 20. It will be recalled that if p 
i.s the probability of an occurrence of an event in a single trial, then in s trials 
the expected number of occurrences is sp, whether the trials are independent or not. 
It is assumed that the data, xi,Xi , • • • are independent and normally distributed 
about zero with constant variance V. Then any linear function 

(1) yr = a—mXr—m d“ • ‘ 'f~ OoZ, -f" “h ‘ • "h Um^r+m 

is likewise normally distributed about zero ; and the variance of yr is 7 = , 

(a) Probabilities When Two Conditions Are Imposed. Consider first the 
“planes” ijr-i = 0 and yr — 0, each m 2m + 1 dimensions; and jointly in 2m + 2 
dimensions. They form four "dihedral” angles. Let 

(2) 6 = angle between pr-i = 0 and y,- — 0, 

the inside points (air-m-i , • • • , air+m) being such that i/r-i < 0, and yr > 0. 
Now, an orthogonal transformation or “rotation “leaves invariant this angle 
9 and also the normal probability function: 

(3) Probability = Const. -exp [ — "Lzl/^V]. 

The angle d may be found'” from 

m— ■! / m 

(4) cos 9 = S a*a.+i /So?. 

Let us think of the rotation which carries the intersection of tho'planes into 
the “vertical” position. To find the probability that i/r-i < 0 and yr > 0, we 
integrate over all 2m + 2 dimensional space wdrich lies between the two planes 
in the dihedral angle thus characterized. For 2m of such variables, the integra- 
tion extends from — oo to -f- <» yielding unity as a factor. If u and v are the two 
variables that remain, then we are to find the volume of that portion of the solid 

(5) 2 = (l/2w7) exp [-(«" + v)/2V] 

which lie,s between two vertical planes through the origin making the angle 9 
with each other. Then, 

(6) Probability of up-crossing = 9/360®. 

(7) Cycle length for up-crossing = 360°/9. 

Let 

Aj/r = 2/r+l - Vr • 

D. M. Y. Sommerville An introduction to the Oeoinelry of N Dimensions Methuen 
and Co , Ltd., London, 1929. See p. 76, 
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Then Hr is a maximum if Ayr-i > 0 and Ayr < 0. Suppose 
(g) Bi = angle between Ay,_i = 0 and Ayr = 0, 

inside points making Ayr-i > 0 and Ayr < 0. Then 

( 9 ) Probability for maximum at yr = 0i/36O° 

(10) Cycle length for maxima = 360°/5. 

The same equations apply to minima; since for minima we simply reverse 
the two foregoing inequalities, and pass to the equal “vertical’' dihedral angle. 

Likewise, from A^r-i < 0 and A^r > 0 we obtain an angle 02 such that 
^j/360° is the probability for change of inflection from concave downward to 
convex downward, This is also equal to the probability for change of inflection 
from convex to concave. Such changes of inflection have some interest on 
their own account and in checking ; but do not seem to have any direct bearing 
upon the main problem under discussion here. 

(b) Prohabilittes When Three Conditions Are Imposed. We consider now the 
eliminaiion of ripples. To make yr a maximum, two linear conditions are 
required. A third linear condition such as yr > \{yr-h + yr+k), or simply 
y, > yr^-k I with A: > 1, will remove some ripples. Suppose we have given 
three planes through the origin, 

UiXi + asXi + • • • + a„x„ = 0, 

(11) biaJi + b2X2 + • • • + b„Xn — 0, 

Cl*! + 02^2 + • • • + C„Xn == 0. 


The angles between these planes in pairs are 


(12) 


cos a = 


2b, c, 


cos = 


2a, c, 

(i^S) 




COS 7 = 


2a, b, 

(2ar2b^y 


In general, eight-trihedral angles are thus formed at the origin; since we may 
take acute angles for a, /3, and y or their supplements. By an orthogonal 
transformation or “rotation about the origin” we are led to the three dimen- 
sional problem of finding the portion of a sphere lying in a specified spherical 
pyramid with base a spherical triangle, ABC, having spherical excess E = 
A A B A C — 180®. Now the spherical surface is 4 great circles or 720°. 
Hence, for a maximum, subject to an additional linear homogeneous inequality, 


(13) Probability of conditioned maximum = £1/720° 


care having been taken to enter the proper trihedraVangle. 

(c) Probabilities When Four Conditions Are Imposed. To avoid complexities 
involved in the use of four intersecting planes, the following expedient was em- 
ployed. Consider a set of values of yr such that this Vr is a maximum. Among 
these there is theoretically a certain fraction or proportion p at which also 



258 


EDWARD L. DODD 


Vr > Vr+k , with k > and the same proportion p satisfying > y^_f . , Let p' 
be the proportion satisfying both inequalities Then 1 — p'gl-p4.i_p 
leads to 

(14) p' ^ 2p — 1 == — (1 - pf. 

If p is fairly close to unity; a good approximation for p' would seem to be 

( 15 ) P' = p'. 

This p would have been exact for p', had the graduated values been independent. 
That p' is here only slightly above 2p — 1 seems likely, from the graduations 
that I have examined; for, the failure of one of the inequalities pr > or 
Vr > Hr-it was seldom accompanied by the failure of the other. 

For graduations with the Spencer 21-term formula, when k = 5, p = 0,936, 
and (1 — pf = 0.0041, which is fairly small. In practice, we would find in 
this case directly P = 0.07126 = probability of a maximum; Pp = 0.0668 = 
probability of a maximum at ijr with yr > j/r+s Then the probability Pp' of a 
maximum at Pr with ijr > pr+t and Pr > Pr-i would have as lower bound 
2Pp - P = 2(0 0668) - 0.07125 = 0.06235. 

But a closer approximation to the actual value would seem to be Pp^ = 
[Ppf/P = (0.0668)70.07.125 = 0.0626 

This would give a cycle length of 1/0 0626 = 15 97. 

(d) Indications from Correlation Theory. We may also attempt to estimate 
a cycle length with the aid of correlation theory If for graduation, we use a 
linear operator with coefficients proportional to successive ordinates of a cosine 
curve with a specified period, it is, I presume, fairly well known that the gradu- 
ated values tend to exhibit the period of that cosine curve Moreover, this 
quasi period may be induced very strongly with the use of formulas which 
represent “damped vibration” as shown by H. Labrouste'^ and Mrs Labrouste 
Now many standard graduation formulas have plots re.scmbling somewhat 
damped vibration Here, the central .symmetrical arch leading down to the 
lowest negative terms on each side is u-sually large in comparison with the 
flanking iiavcs. Now for a strict cosine curve of period 2k, the coefficient of 
correlation ol Pr and pr+h is —1, at least theoretically. For chance material pt, 
with mean zero and constant variance, the coefficient of cori'clation between 17 , 
and ?/,H j is dcffiiual in terms of o.xpectcd values, thus: 

(16) P; = E{yryr^,)/E(yl). 

For graduated values, pr , we might then seek the value j which will make p, 
us close to —1 a.s possible. But for most common graduation formulas, p, does 
not approximate —1, This difficulty, howcvei, disappears if the graduation 


a H. and Mra Labrouste, “Haimomc analysis by moans of linear combination.s of ordi- 
nates,” Tcrreslrial Magnetism and Atmospheric Eleclrictly, Vol. 41 (1936) pp 15-28. See 
pp 17, 18. 
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foimula is properly centered. In a Fourier series, there is a constant term, 
ffhich is of no significance in indicating oscillations, and is sometimes eliminated. 
The analogous modification for a linear graduation formula with n coefficients— 
of which the sum is unity— would seem to be the .subtraction of 1/m from each 
coefficient, forming what I regard as a residual For this residual, negative 
correlations of substantial size appear. And that j with which the numerically 
largest negative correlation p, is associated may be con.sidered as indicating a 
half-cycle length. 

In the case of the Spencer 21-tcrm formula, j = 8, making cycle-length = 16, 
just about identical with the cycle length for maxima at y,. with > 1/^-6 and 
yr > Vr+i • _ 

(e) The period of a Closely Fitting Cosine Curve. By another route, also, we 
may approach the problem of associating with a specified linear graduation a 
number as the length of induced cycles. We shall consider here only those 
formulas in which the coefficients arc symmetrical with respect to the center. 
In equation (1), this means that a-, = a^-j = 1,2,.. , m. Suppose now that 
the a’s are no longer chance elements, but are the siicces.sive terms of a cosine 
curve with period k. That is : 

(17) Xt = cos (r0 -f- «); 6 = Irik — ^&0°fk. 

Then, if = a, , it follows that 

(18) a_,Xr-j -h a,Xr+, = 2a, cos cos {r6 a) 

Then, from (1), 

(19) Vr = C cos {rd -f a), 

where C is independent of r. For a given graduation formula, with a’s speci- 
fied, this C depends upon 8, or wc may .say, upon k = 360°/^. We may regard 
the graduation formula yo as "fitting best” the curve cos [r(360°)/fc] when k is so 
chosen as to give to C a largest value. The presumption is that the graduation 
formula will curl chance data up into cycles in about the same fashion as a 
cosine curve to which it is closely akin. The actual period of this closely 
fitting cosine curve may then he taken as the quasi-period or "cycle-length” 
of the graduation formula. 

If, relying upon intuition, we were to select a cosine curve to fit a given 
graduation formula, we miglit easily decide to disregard the small waves that 
usually flank the central arch, and to take a cosine curve wdth a span— distance 
between minima— equal to the .span of this central arch. In fad, this span 
gives, I believe, a good first estimate of the cycle length of the induced tuaves. This 
first estimate seems, however, a trifle too small. 

3. Size of Ripples, Simple Summation, Variability, and Height of Waves. 

(a) Size of Ripples. In the use of yr > ijr+k to remove ripples, what integer 
should we take for fcf The dividing line between ripples and waves is of course 
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arbitrary. As Figure 2, p. 109, Slutzky exhibits 1,000 graduated values from 
two-fold summations by 10, with ripples removed. He states (p. 119) ; “maxima 
and minima with amplitudes of ten units or less being discarded as ripples," 
For this double summation, I find that the probability that will be a maximum 
with Vt > yr+w and yr > 2/r-io is approximately 0.0437. Among 1,000 graduated 
values, 43.7 such maxima would then be expected. Slutzky marks with arrows 
the 41 maxima which remain after the elimination of what he regards as ripples. 
The reciprocal of 0.0437 gives 22.9 as cycle length. Then = 10 is less than 
half this cycle-length. For standard graduation formulas, it would seem likely 
that a value of k about one-third the span of its central arch would eliminate fairly 
well the inconsequential fluctuations', and likewise for graduations, with coeffi- 
cients forming an arch with nearly horizontal ends, like twelve-fold summation 
by twos, with arch span 12. For this twelve-fold summation, I find that 
0.0831 is the probability that a maximum will occur at y ^ , with y, > yr^^ 
and 2/r > Vr-i , giving 8.31 such maxima per hundred graduated values. 
Slutzky’s Figure IVa shows eight such maxima, and two ripples. 

(b) Simple Summation. I shall not discuss in detail the cycles produced by 
simple summation or averaging. Formulas for probability here are relatively 
simple. Thus, for the sum or average of n normal chance data, the probability 
of a maximum is 1/4, irrespective of the value of n. This appears to be about 
valid for rectangular data if we count the weak maxima A simple average of 
chance data, however, seems to inherit largely the chaotic character of the present 
data. But some sinuosity is, after all, implanted. 

(c) Variability. A general discussion of the variability of induced waves is 
beyond the scope of this paper. However, I record a numerical result. For 
the Spencer 21-term graduation formula, the probability of a maximum is 
0.07125. Among 580 graduated values, then, 41.3 maxima would be expected. 
Actually, 42 maxima were found. Now, if n — 1 points are placed “at random” 
on a line of unit length — here dx is the probability that a point will fall in an 
interval of length dx — then the expected value^^ of the sum of the squares of 
the resulting n segments is 2/(n A- 1). Thus, if 42 points are placed at random 
on an interval of 580 units, the expected sum of the squares of the seg- 
ments is (2/44) (580)“ = 15,290.9. But, if the points are placed at equal 
intervals, this sum of squares takes its least value, (580)V43 = 7,823.3. Then, 
15,290.9 — 7,823.3 = 7,467,6. On the other hand, the 42 maxima among 
Spencer graduated values gave segments for which the sum of the squares was 
8,656.5; that is, only 833.2 in excess of the above 7,823.3, which represents perfect 
periodicity for maxima. Of course, this excess of 833.2 indicates considerable 
departure from perfect periodicity; but it is nowhere near the 7,467 6 to be 
expected from a random distribution of points. In spite of irregularities, the 
sinusoidal character of graduated values is conspicuous. 

(d) The Height or Amplitude of Induced Waves. While our chief interest 


VV. Burnside, Theory of Probability, Cambridge University Press, 1928, See p. 71. 
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here lies in what is called the length of a cycle, a brief reference may well be 
made to the amplitude or height of the induced waves. The operation of the 
linear function yr in (1) upon data with variance V yields graduated values with 
variance FSa! . This particular statement doe.s not require the assumption 
of normality Thus the Spencer 21-terra formula is expected to produce gradu- 
ated values with a standard deviation 37.8% of that of the data. This repre- 
sents some reduction, of course; but, ncverthoie.ss, the “waves" stand out in bold 
relief. They are not diminutive. 

4 Data and Graduations Examined. Slutzky's graduations, exhibited in 
Emometnca, Vol. 5, have already been mentioned. Three sets of chance data 
were graduated by students at the University of Texas, Mr. Victor W. Pfeiffer 
in 1936, Mr C. M Tolar and Miss Anna Velma Martin, in 1938, to make tests 
with regard to smoothing coefficients,^’ the results appearing in M.A. theses. 
The data were figures in the tenth place of the Vega logarithm tables, 600 num- 
bers in each set, as follows' Logarithms from 200 to 799; logarithms of cosines 
of angles from 0° to 59° 64', by intervals of 6'; logarithms of sines of angles 
from 6' to 60° by intervals of 6'. The graduation formulas used were all sym- 
metric, with a_, = Oj . Mr Pfeiffer used the Spencer 21-term formula, with 
coefficients 1/350 of . 

-1, -3, -5,-5, -2, 6, 18, 33, 47, 67, 60, 57, etc. 

The other two formulas used were 11-term formulas which I devised, correct 
to third differences, and with fourth differences rather small, described by: 
-•1.13 and —0 97 where D = log* jK (sec Henderson, loc. cit., pp. 26-37); 
as compared with — 5.4 J5'' for Woolhouse 15-term, and — 12.6Z>^ for Spencer 
21-term. These two 11-term formulas are: 

(i) Averaging by twos, threes, and fours, applied to (1/12) (—4, 3, 14, 3, —4) 
yielding (1/288) (-4, -9, 3, 36, 73, 90, 73, 36, 3, -9, -4); 

(li) Triple averaging by threes, applied to (1/10) (—3, 2, 12, 2, —3) yielding 
(1/270) (-3, —7, 0 , 29, 71, 90, 71, 29, 0, —7, —3). From part of the foregoing 
data, also, I made other graduations to check certain probabilities. 

6. Cycle Lengths for the Spencer 21-Term Graduation Formula. All the 

various determinations of cycle length mentioned in the foregoing were applied 
to the Spencer formula, and to some other formulas The results obtained for 
the Spencer formula seem representative, and will be given here in detail. Our 
main conception of a cycle-length is that it is the reciprocal of a probability or 
relative frequency. The probability of a minimum is the same as that of a 
maximum ; of a down-crossing of the base line, the same as that of an up-crossing 
Probabilities are listed that the- representative ordinate yr will be a maximum — 


Robert Henderson, Oraduation of Mortality and Other Tables, Actuarial Society of 
America, New York, 1919, p. 34. 
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with or without further restrictions. The probability is given for an up-cross 
at the representative abscissa x, . In the table which follows, a middle entry 
for a cycle length of 16 is obtained from the “residual” described in (d) of 
Section 2. 


Expected Length of Cycles Produced When Normal Chance Data Are Graduated by the 
Spencer Sl-term Formula in Accordance with Various Specifications for the Cycle 


Specification 

Probability 

Cycle-Length 

Maximum at Vr . . 

0.07125 

14.0 

Maximum at j/, with y, > yt\% .. 

0 0668 

15.0 

Maximum at y, with y, > Kyr-? + yr+ 7 ) 

0.0657 

1 

15.2 

Maximum at y, with y, > yr+a , and yr > yr-a . ■ . 

0.0626 

16.0 

By uae of "residual”. (See 2(d)) . . 

1 

1 

16,0 

Maximum at yr with y, > yr +7 

0.0623 

16.1 

Period of “beat fitting" cosine curve. (See 2(e)). . 


16.7 

Maximum at yr ,yr> 0. (Or; pr > Mean y,) 

0 0591 

16.9 

Maximum at y, with yr > yr +7 and yr > yr-; 

0,0545 

18 3 

Up cross at Xr . 

0.0469 

21.3 


The foregoing exhibit seems to suggest a cycle length of something like 16 for the 
cycles created by the operation of the Spencer 21-term formula upon chance data. 
This is just about the reciprocal of the probability that a maximum will occur 
at 2 /r with Pr > Vr+i and pr > Pt-s ■ If 16 i.s^thus set up as the standard wave 
length, a wave of 10 units extending from Xr-e to Xr+h would not be regarded as 
insignificant. 

Now 16 is also the interval between the outermost low coefficients, —5, in 
the Spencer formula. The plot of a curve through ordinates equal to the 
Spencer coefficients would probably make the central arch have a span of 
about 15. This 15 seems a little too small as a representative of cycle lengths 
obtained by the foregoing different methods. 

From the theory set forth, 0.0626 is the probability that a maximum will 
occur at pr with pr > Pr+e and pr > Pr-s . Then among 580 graduated values, 
36 3 such maxima would be expected Among the Pfeiffer graduated values 38 
were actually found. 

6. Comparative Results of Seven Graduation Formulas. An exhibit will now 
be made of results obtained from seven graduation formulas Of these, the 
simplest is double averaging or summation by tens, with coefficients forming a 
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triangular arch, with a “span” which will be set down here as 18 Next in 
order of simplicity— avoiding negative coefficients— is 12-fold averaging by twos. 
Probabilities are given that a maximum will occur at a point , with y, > 
and ijr > Vr+k for what seems to be appropriate values of k. In the five cases 
where graduations were made, the number of the maxima of specified character 
actually found are set down in line with their expected values. Also the span 
of the central arch is compared with cycle lengths. 

Macaulay (loc. cit., pp. 73, 74) mentions favorably a 43-term formula obtained 
as follows: Summation by 8, by 12, doubly by 5, applied to weights; -h7, -10, 
0, 0, 0, 0, 0, 0, +10, 0, 0, 0, 0, 0, 0, —10, +7. This is the longest formula to 
be considered here. 

As noted before, my theory is based upon the assumption of a normal distri- 
bution for the data. The data actually tested had a rectangular distribution. 
Nevertheless, close agreement was found between the expected number of 
maxima and the number actually found. 


Sesui/s of Ap'plying Seven Graduation Formulas to Chance Data. Comparison of the Expected 
Number of Conditioned Maxima with the Actual Number Found Among Graduated 
Values, and Comparison of Cycle Length mth Span of Central Arch 


(1) 

Graduntion Formula 

(2) 

k 

(3) 

Probability 
Max at Vr 
Vt > Vr-k 
»r > Vrrk 

Niimlier 
of Grad- 
uated 
Iteme, Vr 

^ ^ 
Expected 

Number 
of Such 
Maxima 

Actual 
Number 
of Such 
Maxima 

(7) 

Cycle 
as 1/(3) 

C8) 

Span of 
Central 
Arch 

11-term by Tolar . . , 

3 

0 no 

590 

64 9 

67 

9 09 

8 

11-term by Martin 

3 

0.114 

590 

67 3 

65 

8 77 

8 

13-term (2)’“ by Slutzky . . 

4 

0.0831 

100 

8.31 

8 

12.0 

12 

19-term (10)* by Slutzky 

10 

0.0437 

1,000 

43.7 

41 

22 9 

18 

21-terra Spencer by Pfeiffer. . . . 

5 

0.0668 

580 

36.3 

38 

16 0 

15 

29-terra Kenchington 

8 

0 0428 




23.4 

20 

43-term Macaulay . 

9 

0,0389 




25 7 

22 


7 Summary. E Slutzky found that the summing of chance data resulted 
in series of numbers with something like a cyclic appearance, — this being intensi- 
fied by repetition of the summing. Slutzky and others have proven limit 
theorems. In this paper, I study the effects of a single application of a gradua- 
tion process upon chance data. The most acceptable graduation formulas con- 
tain negative coefficients, and thus involve something more than repeated 
summations. Several methods are discussed for assigning to a given graduation 
formula a number as the length of the cycles it tends to produce. One of the 
most satisfactory of these is in line with the suggestion of Slutzky that before 
counting maxima, any insignificant “ripples” should be eliminated. The proba- 
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bility is found that a graduated value yr should be a maximum-greater than 
the two adjacent values 2/r_i>and 2/t+i — with the further condition that for some 
appropriate k, yt shall be greater than yr-k or yr+i or both. The reciprocal of 
this latter probability is suggested as the length of the cycle which the given 
graduation tends to implant in the graduated values. 

The University of Texas. 



ON THE DISTRIBUTION OF THE “STUDENT” RATIO FOR SMALL 
SAMPLES FROM CERTAIN NON-NORMAL POPULATIONS' 

By H. L. Rietz 

Much of interest in the theory and practice of statistical methods has been 
developed around the distribution function, 

m r(W 

X ~~ ftt 

of the “Student” ratio, z = , where x denotes the mean, s the standard 

s 

deviation of a sample of N items, say xi , ■ , X/f , taken at random from a 

normally distributed parent population of mean, m. 

The investigations of certain non-normal parent distributions by Shewhart 
and Winters [1], Rider [2], E. S. Pearson [3], M S. Bartlett [4], and R. C. Geary 
[6] indicate that applications of the “Student” theory give more satisfactory 
results than the classical theory for a considerable variety of non-normal parent 
distributions, but some of these investigators find that the theory fails in certain 
cases to describe the facts to an extent that suggests further experimental 
sampling investigations along this line whenever suitable data are available, 
Others infer that a completely satisfactory analysis of the position of the “Stu- 
dent” 2 -te 8 t will be possible only if the theoretical distribution of z in samples 
from the non-normal distribution in quastion becomes known. Several of the 
above named statisticians have attributed the failures of the distribution (1) 
to describe their data, in large part, to the correlation between x - x — m and s 
For this reason, there is considerable intere.st in the degree of con elation between 
z = X ~ m and s, and especially in the nature of the regression of s or of s on. x. 

The present paper gives an analysis of data obtained by experimental sampling 
from two non-normal distributions who,se sources we shall now describe. The 
parent distributions with which the paper is concerned arc theoretical di.stribu- 
tions resulting from certain urn schemata devised [6] by tlie writer some years ago 
In 1925, Leone E Chesire, in an unpublished thesis for the degree, Pilaster of 
Science, at the University of Ioi\a, obtained data by experimental sampling, 
that seem to be appropriate material for a study of tlie correlation of mean and 
standard deviation for .small .samples from certain non-normal distributions 
One of the original biiuiriate parent population.''', wliosc marginal totals wc are 


‘Presented m part before the Ameriran Matliemiitical Hocietv under u somewlint 
different title, November 26, 1937, 
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using, exhibited linear regression while the other exhibited non-linear regression 
For convenience in distinguishing between the two cases, we shall speak of 
material from the linear case as Case I and that from the non-linear case as 
Case II. After devising a scheme for drawing pairs of variates at random 
5,000 pairs wore drawn in sets of five for each of the two cases, ’ 

While the primary purpose of this experimental sampling was to study the 
distributions of means, standard deviations, and correlation coefficients [7] for 
sma]l samples from the non-normal populations, we have as a by-product, in the 
marginal totals of the correlation tables, four sets of 1,000 pairs of means and 
standard deviations. However, since three of the four sets of marginal totals 
of the two theoretical parent correlations tables are alike, we have actually 
only two significantly different sets to consider. 

Case I. For the case of linear regression of ?/ on a: in the bivariate parent 
population, the parent distribution from the marginal totals may be very simply 
described by showing the frequency distribution in Table 1. 


TABLE 1 


Sums m second throw of dice-values 
of stochastic variable 

2 

3 

4 

1 

5 1 

6 

7 

8 

9 

10 

11 

Frequency 

1 

6 

12 

18 

24 

30 * 

j 

36 

30 

24 

18 

12 


The moment coefficients and /S’s which characterize the distribution given in 
Table 1 are: 

Mean = 7, ta — Sf-, m = 0, mi ~ 80.5, = 0, ft — 2p^. 

Each of the 1000 sets of five drawn from the distribution in Table 1, 3 dclds a 
mean y and a standard deviation, Sy , which we shall denote by w to make our 
notation simpler to write. ' Table 2 is the correlation table of the pairs {y, w). 
The correlation coefficient r„y, between mean y and standard deviation s„ = w 
has a value 

Twy = - 0.020 ± 0.021 

which differs insignificantly from zero. 

The uncorrected value of the correlation ratio of ui on y is 

Vu,u = 0.182. 

When we remember that the correlation ratio is not free to vary in the negative 
direction from 0, and apply the Pearson correction [8] for this situation together 
with the “Student” correction [9] for grouping, we obtain for the corrected, 
Tiwi , the value 0.133. 

It becomes fairly obvious that significant correlation exists and that the 
regression is non-lmear. Indeed, it has been shown recently by Geary [5, 
pp. 178-9] that normality in the parent distribution is both a necessary and 
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TABLE 2 

CotnWio^ of rnsan y, and standard deviation s, = w, of samples of five items for Case I, 
Mean of y’s = ij = 7,141. Correlation coeficienl rwy = “0 020 rfc 0.021, 

Sj = M) = 2 079. Correlation ratio of w on y, rjws = 0.182 {uneorrecied) 



sufficient condition for the independence of the mean and standard deviation 
in samples. 

Since the number of correlated item.s, N = 1000, is fairly large, iie examine 
into the significance of riwy = 0.182 under the as.sumption that Nijly ns approxi- 
mately distributed [10] as with a — 1 = 16 degrees of freedom. This criterion 
gives odds in favor of significant correlation on approximately a 100 to 1 level of 
probability. 

Next, the means of arrays, Wp , were plotted to scale on Table 2 to gii e a 
general notion of the nature of the regression of w — s„ on y. The location of 
these means of array,? of w’s, affords at least a suggestion of paiabolie n-gression 
[11] with the curvature concave doivinvard as is to he expected vlien ~ Pt 
3 < 0, where the /S', s relate to the parent di.stribiitiou. 
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The next step taken was to analyze the variance, as indicated partly in Table 3 
where w, (^ = 1,2, ■ • ■ , N)- denotes the stochastic variates, a the number of 
arrays of in’s, ffi.the mean of the N values of w, , nj,(p = 1,2, a) the number 
of variates m an array marked p, Wp the mean of the array marked p, and where 
the class interval in Table 2, is taken as the unit. 


TABLE 3 


Sum of squares 


For deviations of means of arrays 
of w‘b 

For deviations of variates from 
the means of their arrays . . 

A 

np(wp ~~ io)^ ~ 380 

H H (‘‘^t ~ — 11,098 

a - 1 = 16 

A' - 0 = 983 

Total , 

AT 

22 (lo. - w)‘ = 11,478 

t-l 

A - 1 a 999 


In the exhibit given in Table 3, we use the usual algebraic identity 

(2) 2 (w, - ro)“ = £ npiwp -©)’“+ 12 £ (w, - ®p)^ 

t— 1 p«l 

where the double sum is made up of a sum of N squares. 

By dividing the members of (2) by N, we have 

(3) 4 11 (wi - ^ H np(v}p - wy + ^ H H (w, - M>p)® 

»"“1 iV PT"! 

The writer has used the identity (3) for many years in lectures to beginners in 
statistics in proving the equivalence of two definitions of the correlation [12] 

ratio and is strongly of the opinion that the equality in form (3) appeals more 

readily to the intuitions of many readers, because of their acquaintance with 
statements in the language of averages, than does the equivalent equality (2) 
in the language of sums of squares. 

In an extended and more compact form, the analysis is shown in the standard 
form in Table 4. 


TABLE 4 


Variance 

Degreea of 
freedom 

Sum of 
squares 

Mean 

square 

*-teefc 

Between arrays 

16 

380 

23.76 

ilog, 23.75 = 1.584 

Within arrays 

983 

11,098 

11.29 

ilog. 11 29 = 1.212 

Total 

999 

11,478 

i 

Difference = 0 372 


When the sum of squares equal to 380 associated with variance between arrays 
is further analyzed into a part which could be represented by linear regression, 
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and a part which represents deviations of the calculated means of arrays of w’s 
from a straight regression line of w on y, the deviations being measured parallel 
to the in-axis, we find that the -part of the amount 380 represented by linear 
regression is given by 

Nrtist ^ 1006 (.00040) (11. 487) = 4.3. 

Since both r — .020 ± 0.021 and the small value, 4.3, as part of the sum of 
squares amounting to 380, may well be regarded as sampling fluctuations, we 
revert to the figures in Table 3 and apply the Fisher z-test. It turns out that the 
correlation is significant on practically the 100 to 1 level of probability which 
conforms well with the above inference based on the assumption that Nnly is 
distributed as x, with a — 1 degrees of freedom 

Next, we computed 1000 values of the “Student” ratio z = {y - 7)/w, for 

Case I. One of these 1000 values was of the indeterminate form ^ . A frequency 

distribution of the 999 determinate ratios is shown in column (3), Table 5. 

By grouping together the class frequencies at the tails of the theoretical dis- 
tributions until each of the end class frequencies is not less than 6, and calculating 
X* for the observed distribution in column (3) in comparison with the theoretical 
distribution in column (6) as found from the “Student” theory in samplesof 5 
items from a normal distribution, we obtain x — 3.728 with 11 degrees of 
freedom. 

Thus, the differences between the distribution in column (3) and the “Student” 
distribution for iV = 6 shown in column (6) are not only insignificant under the 
X^-test, but are so small as to be expected in a relatively small percentage, of 
statistical experiments even if the “Student” z-distribution were the theoreti- 
cally exact distribution of our ratios. 

The usual moment coefficients of the distribution of observed z's in column (3), 
Table 5, are: 

ni = 0.033533, H3 = 0.254383, |8i = 0.55955, 

s = = 0.69799, fn = 2.22504, Pi = 9.37353. 

Since the value, 0.69799, of the standard deviation of the observed distribution 
differs very little from l/-\/N — 3 = 0.70711, the normal curve fitted by using 
the standard deviation of the observed distribution (column 4, Table 5) differs 
very little from the normal curve with the origin at the population minin and 
standard deviation, ■\/2l2, (column 5). Furthermore, the application of the 
x'^-test to columns (4) and (5) of Table 5 with class frequencies in the “tails” 
grouped as above gives = 2.91 with 9 degrees of freedom. 

The moment coefficients of the observed distribution indicate a markedly 
leptokurtic and somewhat skew distribution but the indications of skewness 
may be traced mainly and perhaps entirely to the presence of tlie two extreme 
variates at the upper end of the distribution and separated about tlii'i'C limes the 
standard deviation from the next class frequency that differs from zcio By 
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TABLE 5 


Distnhution of the ratios, z = (y — 7)/tv in samples of N = 5 for Case I. 


(1) 

2 = (5 - 7)/w 

(2) 

(3) 

Observed 

distribution 

(4) 

Normal distri- 
bution fitted to 
observed 
column (3) 

(6) 

Normal distri- 
bution of S.D 

1 1 

(«) 

Prom tho Student 
theoretical 
distrihution (or 

W = 6 

t » z‘\/N — L 
“ 2z 

“ VF^3 Vi 

in eamo units os 
z (measured from 
population, mean) 

-6.0 

-12 




0.1 

-5.6 

-11 




0.1 

-6 0 

-10 




0.1 

-4.5 

-9 




0.2 

-4.0 

-8 




0.3 

-3,5 

-7 




0.6 

-3.0 

-6 

2 

0 05 

0.1 

1.3 

-2.6 

-5 

1 

0.76 

0,4 

2 7 

-2 0 

1 —4 

5 

3.6 

6.2 

7 0 

-1.5 

1 

1 

27.7 

32 0 

1 21 0 

-1,0 

-2 

67 

98,5 

105 9 

70.6 

-0 5 

-1 

216 

210 8 

217.2 

217.6 

0 

0 

357 

279.3 

275 4 

356,2 

0.5 

1 

226 

225 7 

217 2 

217.5 

1.0 

2 

75 

111 7 

105.9 

70.6 

1 5 

3 

22 

33.7 

32 0 

21.0 

2 0 

4 

5 

6.2 

6 2 

7.0 

2.5 

5 

1 

0 75 

0.4 

2.7 

3 0 

6 

3 

0 05 

0 1 

1.3 

3.5 

7 

0 



0 6 

4 0 

8 

0 



0.3 

4.5 

9 

0 



0.2 

5.0 

10 

2 



0 1 

5 5 

11 




0,1 

6.0 

12 




0.1 



999 

998.8 

999.0 

999.0 


excluding these two variates from our calculations, 
moment coefficients : 


/ 

Mi 


0.023571, 

s == vm^ = 0.662202, 


W = 0.022264, 
Pi = 1.009673. 


we obtain the following 


ft = 0.0068786, 
ft = 6.2507062. 


In the observed distribution thus modified, by excluding the extreme upper 
clas.s frequency 2, the evidence of skewness has disappeared. 

Case II. For our Case II we have a frequency distribution as shown in 
Table 6. 


TABLE 6 


Tcitiils HI SL’Coiul tlu'ows of two dice- 
vuluet, of the btochastio variable. 

2 

! 

3 

4 j 

5 

6 

7 

8 

9 

10 

11 

12 

Frequency ... 

1 

4 

9 

16 

i 

25 , 

36 

35 

32 

27 

20 

11 
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Again, since with the uncorrected , Table 6, we have Nv% = 31.5, and 
since is approximately distributed as with a - 1 = 17 degree of freedom, 
we have odds of the order of 100 to 1 against so large a value being a mere 
sampling fluctuation. 


TABLE 7' 


dorrelaliott of mean u, and standard desialion Su = v, of five items for Case II, mean of 
u = d = 6.971. Correlation coefficient rvZ — —0.012 ± 0.020. 

V — Su = 2 044. Correlation ratio of s on U, nn = 0.177 (.uncorrected) . 
tl 



Now proceeding to the analysis of variance, we substitute our numerical values 
derived from Table 7 in the identity 

(■i) 12 (vv ~ vY = 12 npi^p — *')“+ Z) S (n. - VpY 

t=l 37—1 

and obtain, in terms of class intervals as units, 

' 10,S7I = 340 4- 10,531. 

An outline of the analysis is exhibited in Table 8 
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TABLE 8 


Variance 

Degreee of 
freedom 

Sum of 
squares 

Mean 

1 square 

a-teet 

Between arrays 

Within arrays, . , 

17 

982 

340 

10,531 

20.00 

10.72 

i log, 20.00 = 1.60 
i log, 10,72 = 1,18 

Total 

999 

10,871 

1 

1 

Diff. = 0 32 


The moment coefficients and |3’s which characterize the distribution in Tablet 
are: 

Mean = 7.972, = 4.888, ah = -1.756, n, = 58.724, 

= 0.0264, = 2.449. 


As in the linear case, samples of 5000 pairs of variates were drawn in sets of 
five by Miss Chesire. Analogous to Case I, our first concern is with the regres- 
sion of the standard deviation, Su = v, ol u from a sample of five on its mean 
value, tl. 

The correlation table for values of u and v is shown in Table 7. The correla- 
tion coefficient is 

r,s = —0.012 = ±0.021, but the uncorrected correlation ratio of r oniZ is given 

by 


riuji = 0.177. 


After applying the Pearson and Student correction.?, we obtain the corrected 


= 0.131. 

When the sum of squares, 340, associated with variance between arrays is 
further analyzed into a part which could be represented by linear regression, 
and a part which represents deviations of the calculated means of arrays of 
u’s from a straight regression line of v on u, the deviations being measured parallel 
to the y-axis, we find that the part of the amount 340 represented by linear 
regression, would be only iVr^s; = 1000 (.000144) (10 871) = 1.6. 

Since both nz = —0.012 ± 0.021 and the small value, 1.6, as part of the sum 
of squares 340, may well be regarded as sampling fluctuations, we revert to the 
figures of Table 8. 

The difference of the logarithms in the last column of Table 8, is 0.32, which 
corresponds to a level of significance of the general order of 100 to 1. Next, 
we calculate and plot on Table 7 the means of arrays of v’s to give a general 
notion of the regression of v on U. The location of these means of arrays suggests 
rather strongly that the regression of on u is parabolic with the curvature 
concave downward as we should expect from the fact that da — / 3 i — 3 < 0, 
where the (S’s pertain to the parent distribution 

Next, we computed 1000 values of the "Student” ratio, z = {u — 7.972)/y, 
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for Case H. One of these ratios was infinite. A frequency distribution of the 
999 determinate ratios is shown in column 3, Table 9. 

The observed distribution (column 3) and the "Student” distribution (column 
6) of Table 9, to be expected in samples of iV = 5, when samples are drawn from 
a normal distribution, are in close agreement. In fact, when we group together 
the tail frequencies of the theoretical distribution until each of them is not less 
than 5, the result of testing the goodness of fit gives x'' = 17.187 with 11 degrees 
of freedom. This gives a value in the neighborhood of 0.1 for the probability, 
P, that as large or larger deviations than that experienced will occur, due to 
chance fluctuations, in a single repetition of the experiment. In other words, 
on the basis of this test, the indications are that we should have in the long run, 
as large or larger deviations than we have experienced in this case, in about 
10 per cent of a large number of sets of sampling of 1000 per set even when the 
sampling is from a normal distribution. 

TABLE 9 


Dutribuhon of the ratio, (u — 7.972)/» in samples of five for Case II. 


(1) 

(2) 

(3) 

(4) 

(6) 

(S) 





Normal distri- 

Studeni^s 2 -dis- 
tnbution for 
normal parent 
population 
with N 5 

J - (i - r.972)/» 

( Ci zVW — 1 
- 2* 

Observed 

Normal distri- 
bution fitted to 
observed, 
Column (3). 

bution with 

s.D. - 

yfW - 3 
and origin at 





population mean 

-6.5 

-11 

1 



0.1 

-6,0 

-10 




0.1 

-4.5 

-9 




0.2 

-4.0 

-8 




0 3 

-3.5 

-7 




0 6 

-3.0 

-6 


0.1 

0.1 

1,3 

-2.5 

-5 

2 

0.4 

0.4 

2 7 

-2 0 

-4 

3 

4 3 

6 2 

7 0 

-1,6 

-3 

23 

25.4 

32.0 

21 0 

-1.0 

-2 

48 

92,0 

105 9 

70.5 

-0.5 

-1 

203 

205.3 

217 2 

217.5 

-0 0 

0 

380 

278.4 

275.4 

356 2 

0.5 

1 

226 

231.4 

217.2 

217.5 

1 0 

2 

72 

117 5 

106 9 

70 6 

1.5 

3 

24 

36.5 

32 0 

21 0 

2.0 

4 

9 

6.9 

6.2 

7.0 

2,6 

5 

3 

0.8 

0 4 

2 7 

3.0 

6 

4 

0.1 

0.1 

1.3 

3,5 

7 

1 



6 

4 0 

8 




3 

4 6 

9 




,2 

.1 

1 

.1 

999 0 

Total 


909 

999.1 

999 0 

OO 

«0 

1 



1 
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STIMMABT 

1. The linear correlation coefficient, r, of the mean and standard deviation 
differs insignificantly from 0 in each case. 

2. The correlation ratio of the standard deviation on the mean differs sig- 
nificantly from 0, and the regression of the standard deviation on the mean 
conforms, in its general aspects, to expectation under the theory of Neyman [12], 

3. The indeterminate "Student” ratio of the form, in Case I and that of the 

form, (constant) /O, in Case 11 are probably due in part to grouping into class 
intervals, but the infinite ratio would undoubtedly have had such a large value 
that it would be excluded from calculations under any one of the tnown criteria 
for rejection of extreme observations. 

4. Although the rejection of one indeterminate ratio in each of the two cases is 
slightly disturbing, the evidence presented by our analysis of the experimental 
sampling lends support to the view that the results of the "Student” theory are 
almost certainly applicable, for many purposes, when the parent distributions 
are of such non-normal types as are involved in our sampling. 
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THE PROBLEM OF m RANKINGS 

By M. G KbndA-ll and B. Babington Smith 


1, Introduction, If n objects are ranked by m persons according to some 
quality of the objects there arises the problem; docs the set of m rankings of n 
show any evidence of community of judgment among the m individuals? For 
example, if a number of pieces of poetry are ranked by students in order of 
preference, do the rankings support the supposition that the students have 
poetical tastes in common, and if so is there any strong degree of unanimity or 
only a faint degree? 

The problem in its full generality permits of no assumption about the nature 
of the quality according to which the objects are ranked, other than that ranking 
is possible. No hypothesis is made that the quality is measurable, still less 
that there is some underlying frequency distribution to the quantiles of which 
the rankings correspond. The quality is to be thought of as linear m the sense 
that any two objects possessing it are either coincident or may be put in the 
relation "before and after.” A metric may, of course, be imposed on this linear 
space by convention; but the relationship between objects is invariant under 
any transformation which stretches the scale of measurement. In particular, 
it is not a condition of the problem that the ranking shall be based on a distri- 
bution according to a normal variate. 

It is permissible to denote the rankings by the ordinal numbers 1,2,- • n; 
but it IS not permissible, without further discussion, to operate on these num- 
bers as if they were cardinals. This point seems to have been inadequately 
appreciated. For instance, when w — 2 wc have the familiar case of rank 
correlation between a pair of rankings; and this is frequently treated by sub- 
tracting corresponding ranks, squaring, and forming the Spearman coefficient 


( 1 ) 


P = 1 5 . 

TT ~ n 


To justify this procedure it is necessary to explain what is meant, for example, 
by such a process as (4tli minus 8th), or what the square of this difference of 
ordinal numbers represents. 

It is worth stressing that the necessary transition from ordinals to cardinals 
can be made without invoking a scale of measurement. When we rank an 
object as first we mean, in effect, that no member of the set of n is preferred 
to it; ivhen wo rank it as the rth we mean that (r — 1) objects are preferred 
to it, The ordinals of the ranking are then biunivocally related to the cardinals 
expressing the number of objects which are preferred. It is thus legitimate 
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to apply the laws of cardinal arithmetic to them. For example, if an object A is 
ranked n by Brown, n by Jones and Ts by Robinson we may form the sum 
(n + rs + n), which is to be interpreted as meaning that, taking the preferences 
of the three persons together, there were (n + rj + n — 3) cases in which 
some other object was preferred to A. The point is of some importance, in 
view of the prevailing practice of replacing ranks by quantiles of the normal 
distribution — a practice which cannot always be regarded as justifiable and is 
sometimes little short of desperate. 

To fix the ideas, consider the following three rankings of six objects 


Object: A B 
6 4 

2 3 

_4 1 

Sum of ranks 11 8 


C D E F 

16 3 2 

15 6 4 

6 3 2 5 

8 14 11 11 


We may sum the ranks for each object, as shown. These sums (which must 
add to 63, and in general to mn(n + l)/2) reflect the degree of resemblance 
among the rankings. If the resemblance were perfect the sums would be 3, 
6, 9, 12, 15, 18 (though not necessarily, of course in that order) and in such a 
case would be as different as possible. On the other hand, when there is little 
or no resemblance, as in the example given, the sums are approximately equal, 
It is thus natural to take the variance of these sums as providing some measure 
of the concordance in the rankings. If /S is the observed sum of squares of the 
deviations of sums of ranks from the mean value m(n + l)/2 (i.e. is n times 
the variance) we may write 


( 2 ) 


W = 


m 

— n) 


and call W the coefficient of concordance. Here w*(n® — n)/12 is the maximum 
possible value of /S, occurring if there is complete unanimity in the rankings, 
so that W may vary from 0 to 1. In the example given, S = 25.6, W = 0.16, 
The coefficient W has arisen in several ways. 


(a) W is simply related to the average of the 



Spearman rank correlation 


coefficients between pairs of the m rankings. It is easy to show that the average 
p is given by 


(3) 


Pav ^ 


12S 

m 

w — n 

~m 


(4) 


mW — 1 
w — 1 


Pav has been considered by Kelley [3] as a measure of average interoorrelation in 
rankings, but he gives no results for testing the significance of observed values. 
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It is to be noted that whereas W may vary from 0 to 1, may vary from 
-l/(m “ 1) to 1, i.e it is asymmetrical like the coefficient of intraclass correla- 
tion, to which it bears some resemblance.^ 

(b) Friedman [1] has considered a quantity xl related to W by the equation 

(5) Xr = m(n - 1)W. 

(c) Welch [6] and Pitman [5] have considered the problem of the distribution 
of variance in an array 

) ^2 > * * ’ Cln 
hi , hi y ■ • • hn 

etc., for permutations of the numbers a, b, etc. in rows. 

This is more general than the ranking case, in which ai ■■■ a„ , hi ■■■ bn etc. 
reduce to permutations of the numbers 1 ■ ■ ■ n. Since S', the total sum of 
squares in an array of m rankings of n, is — n)/12, we have 

(6) F = I 

i.e, the ratio of variance between columns to the total variance. 


2. Significance of W. To test whether an observed value of W is significant 
it is necessary to consider the distribution of W (or, more conveniently, of S) 
in the universe observed by permuting the n ranks in all possible ways. No 
generality is lost by supposing one ranking fixed, and the others will then give 
rise to (n!)'"“^ values of S. 

The actual distribution of W (or S), as will be seen below, is irregular for low 
values of m and n, and likely to be quite irregular for moderate values. It 
may, however, be shown that the first four moments of W are 


( 7 ) 

( 8 ) 

( 9 ) 

(10) 


n'l (about 0) = -- 


Hi = 


g3 


Hi = 


m 

2(m — 1) 

— 1 ) 

8(m — l)(m — 2) 
m®(n — 1)^ 

24(m - 1) / 25w^ ■ 
— 1 )* \ 


+ — (m - 2)(m - 3)|. 


‘ The Spearman rank con elation coefficient is the product-moment coefficient of correla- 
tion between the i anks considered as ordinary vaiiate values pot is the intraclass con ela- 
tion coefficient for the m sets of ranks, also considered as variate values. 
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Results equivalent to these for the first three moments were given by Fried- 
man [1]; and for the first four moments by Pitman [5] 

In a valuable contribution to the subject Friedman showed that the distri- 
bution of Xf tends to that of x ''vith (n — 1) degrees of freedom as m tends to 
infinity and suggested the use of xt (equation (5)) for an ordinary test of sig- 
nificance in the x distribution. This is satisfactory for moderately large values, 
but for small values it is subject to the disadvantage inherent in any attempt 
to represent a distribution of finite range by one of infinite range— the fit near 
the tails is not likely to be very good. An improvement is obtained by noting 
that the first four moments of the Type I distribution, 

(11) H = gij tf-d - w- 

are approximately those of W if m and n are moderately large, and 


( 12 ) 

(13) 


n 


V = 


g = (m — 1) 


1 ^ 

TO 


n. — 1 


1 

TO 


For practical purposes it is most convenient to put 

z,.'. n (to — 1)TF 

(14) 2 = t log. 


so that 2 can be tested in Fisher’s distribution with (» — 1) 


— (= ni) and 

TO 


(to — 1) |(n “ degrees of freedom. 

There can be little doubt that this test is quite reliable for moderate values 
of to and n; but it has hitherto been far from clear how reliable it is for low 
values of m and n. This point we attempt to clear up in the present paper. 


3. Distribution of S. For the case m ~ 2 the distribution of S is the same 
as the distribution of the used in calculating Spearman’s rank correlation 
coefficient. A table showing the distribution up to and including n = 8 has 
already been given (Kendall and others, [4]). Tables giving probabilities that 
specified values of Xr would be attained or exceeded wore given by Friedman [1] 
for ?! = 3, TO = 2-9; and n = i,in = 2-4. We have taken this work somewhat 
further and obtained the distributions for n = Z, m — 2-10; n = 4, to = 2-6; 
and n = 5, m = 3. Tables 1-4 give the probabilities based on these distri- 
butions. 

These distributions were obtained by tw o methods, The first consisted of 
building up the distribution for (to -f- 1) and n from that of m and n For 
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TABLE 1 

Probability that a given value of S will be attained or exceeded, for n = 3 and values 

of mfrom 3 to 10 

Values of m 


B 

2 

3 

4 

5 

6 

7 

8 

9 

10 

0 

1.000 

1.000 : 

1.000 : 

l.OOO : 

1.000 : 

l.OOO : 

1.000 : 

l.OOO : 

l.OOO 

2 

.866 

.944 

.931 

.954 

.956 

.964 

.967 

.971 

974 

6 

.500 

.628 

.653 

.691 

.740 

.768 

.794 

.814 

.830 

8 

.167 

.361 

.431 

.522 

.570 

.620 

.654 

.685 

.710 

14 


.194 

.273 

.367 

.430 

.486 

.531 

.569 

.601 

18 


.028 

.125 

.182 

.252 

.305 

.355 

.398 

.436 

24 



.069 

.124 

.184 

.237 

.285 

.328 

.368 

26 



.042 

.093 

.142 

.192 

.236 

.278 

.316 

32 



.0046 

.039 

.072 

.112 

.149 

.187 

.222 

38 




.024 

.052 

»085 

,120 

.154 

.187 

42 




.0085 

.029 

.051 

.079 

.107 

.135 

50 




.0^77 

.012 

027 

.047 

.069 

.092 

54 





.0081 

.021 

.038 

.057 

.078 

56 





.0055 

.016 

.030 

.048 

.066 

62 





.0017 

.0084 

,018 

.031 

.046 

72 





.0*13 

.0036 

.0099 

.019 

.030 

74 






.0027 

.0080 

.016 

.026 

78 






.0012 

.0048 

.010 

.018 

86 






.0*32 

.0024 

.0060 

.012 

96 






.0*32 

.0011 

.0035 

.0075 

98 






.0^21 

.0*86 

.0029 

.0063 

104 







.0*26 

.0013 

.0034 

114 







.0*61 

.0*66 

.0020 

122 







.0*61 

.0*35 

.0013 

126 







.0*61 

.0*20 

.0*83 

128 







.0*36 

.0*97 

.0*51 

134 








.0*54 

.0*37 

146 








.0*11 

.0*18 

150 








.0*11 

.0*11 

152 








.0*11 

.0*85 

158 








.0*11 

.0*44 

162 








0*60 

.0*20 

168 









.0*11 

182 









.0*21 

200 









.0’99 
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TABLE 2 


Probability that a given value of S mU be attained or exceeded for n = 4 and 

m = 3 and 6 


s 

m = 3 

M = 5 

S 

TO = 5 

1 

1.000 

1.000 

61 

.055 

3 

.958 

.975 

65 

.044 

5 

.910 

.944 

67 

.034 

9 

.727 

.857 

69 

.031 

U 

.608 

.771 

73 

.023 

13 

.524 

.709 

75 

.020 

17 

.446 

.652 

77 

.017 

19 

.342 

.561 

81 

.012 

21 

.300 

.521 

83 

.0087 

25 

.207 

.445 

85 

.0067 

27 

.175 

.408 

89 

.0055 

29 

.148 

.372 

91 

.0031 

33 

.075 

.298 

93 

.0023 

35 

.054 

.260 

97 

.0018 

37 

.033 

.226 

99 

.0016 

41 

.017 

.210 

101 

.0014 

43 

.0017 

.162 

105 

.0*64 

45 

.0017 

.141 

107 

.0*33 

49 


.123 

109 

.0*21 

51 


.107 

113 

.0*14 

53 


.093 

117 

.0H8 

57 


.075 

125 

.0*30 

59 

1 

.067 




example, with m = 2 and n = 3 we have the following values of the sums of 


ranks, measured about their mean : 

Type 

Frequency 

-2 

0 2 

1 

-2 

1 1 

2 

-1 

0 1 

2 

0 

0 0 

1 


Here —2, 1, 1, and 2, —1, —1 are taken to be identical types, for they give the 
same value of S and will also give similar types when we proceed to m = 3 as 
follows. 

In the case to = 3 each of the above type will appear added to the six permuta- 
tions of -1,0, 1; c.g, the typo -2, 0, 2 will give one each of —3, 0, 3; —3, 1, 2; 
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TABLE 3 


Prohahility that a given value of S will be attained or exceeded for n = 4 and 

m ~ S, 4^ and 6 


s j 

m = 2 

OT = 4 

m = 6 

S 

TO = 6 

^ 0 

1.000 

1.000 

1.000 

82 

.035 

2 

.958 

.992 

.996 

84 

.032 

4 

.833 

.928 

.957 

86 

.029 

6 

.792 

.900 

.940 

88 

.023 

8 

.625 

.800 

.874 

90 

.022 

10 

.542 

.754 

.844 

94 

.017 

12 

.458 

.677 

.789 

96 

.014 

14 

.375 

.649 

.772 

98 

.013 

16 

.208 

.524 

.679 

100 

.010 

18 

.167 

.508 

.668 

102 

.0096 

20 

.042 

.432 

.609 

104 

.0085 

22 


.389 

.574 

106 

.0073 

24 


.355 

.541 

108 

.0061 

26 


.324 

.512 

110 

.0057 

30 


.242 

.431 

114 

.0040 

32 


.200 

.386 

116 

.0033 

34 


.190 

.375 

118 

.0028 

36 


.158 

.338 

120 

.0023 

38 


.141 

.317 

122 

.0020 

40 


.105 

.270 

126 

.0015 

42 


.094 

.256 

128 

.0‘90 

44 


.077 

.230 

130 

.0»87 

46 


.068 

.218 

132 

.0*73 

48 


.054 

.197 

134 

.0»65 

50 


.052 

.194 

136 

.0=40 

52 


.036 

.163 

138 

.0=36 

54 


.033 

.155 

140 

.0=28 

56 


.019 

.127 

144 

.0=24 

58 


.014 

.114 

146 

.0=22 

62 


.012 

.108 

148 

.0=12 

64 


.0069 

.089 

150 

.0^95 

66 


.0062 

.088 

152 

.0^62 

68 


.0027 

.073 

154 

.0«46 

70 


.0027 

.066 

158 

0^24 

72 


.0016 

.060 

160 

.one 

74 


.0’94 

.056 

162 

.on2 

76 


.0»94 

.043 

164 

.0=80 

78 


.0’94 

.041 

170 

.0=24 

80 


.0*72 

.037 

180 

.0=13 
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TABLE 4 


Probahhty that a given value 0 / S mil he attained or exceeded, forn — S and m = s 


s 

m = 3 

S 

m = 3 

0 

1.000 

44 

.236 ~~ 

2 

1.000 

46 

.213 

4 

.988 

48 

.172 

6 

.972 

50 

.163 

8 

.941 

52 

.127 

10 

.914 

54 

.117 

12 

.845 

56 

.096 

14 

.831 

58 

.080 

16 

.768 

60 

.063 

18 

.720 

62 

.056 

20 

.682 

64 

.045 

22 

,649 

66 

.038 

24 

.595 

68 

.028 

26 

.559 

70 

.026 

28 

.493 

72 

.017 

30 

.475 

74 

.015 

32 

.432 

76 

.0078 

34 

.406 

78 

.0053 

36 

.347 

80 

.0040 

38 

.326 

82 

.0028 

40 

,291 

86 

.OW 

42 

.253 

90 

.0^69 


—2, —1, 3; — 2j 1, 1; —1, —1, 2; and —1, 0, 1. These types are then counted 
for each of the four basic types of m = 2 and wc get: 


Type 

-3 0 3 

-3 1 2 

-2 0 2 

-2 1 ] 

-1 0 1 

0 0 0 


Fi'equency 

1 

6 

6 

6 

15 

2 


The case = 4 is treated by considering the numbers of types obtained by 
adding the six permutations of —1, 0, 1 to the types for m = 3; and so on. 

This inctliod is quite com’cnicnt for n = 2 and n = 3, For n. = 4 it becomes 
difficult owing to the labour of considering 24 permutations at each stage and to 
the increa.se 111 the number of typos. For n = 5 theie are 120 permutations and 
the labour becomes excc.^sivc, 
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The second method employed is a generalisation of a procedure we used for the 
Spearman coefficient. Taking first of all the case m = 2, consider the array 

y,* . y.(«+l) 


,(n+2) 


^(n+1) ^(n+2) ^(n+3) 


Any permissible set of values of the sums of ranks is obtained by selecting n 
entries from this array so that no entry appears more than once in the same row 
or column. If then, subtracting from each index the mean {n + 1) and squaring, 
we write 

(15) E = 


-(71-2)2 „(n-3)! 

(jL (Ju 


a a 
0° 

-(n-2)2 „ (>7-1)2 


0 


the values of S are the powers of a in i? when it is expanded as a sum of n! terms 
each of which is obtained by multiplying n factors which do not appear in the 
same row or column. The distribution of S is arrayed by the expansion of E, 
the number of values of any S being the coefficient of a® in the expansion. 

Similarly, for m rankings, the distribution of S is given by the expansion of an 
n- dimensional .B-function. For example, with m = 3 there would be a three- 
dimensional B-function the bottom plane of which would be 


I, 3(n+l)I2 

(, 3(n+l)12 

y 2 ( 


( 3(„+l)12 

f 3(«+l)12 

2 1 

O^ * * 

n+2 

{n+3 


{2„+2-liS±i)}“ 
o' ^ ' 


The plane above this would be 


C_3_^p 






2n+3- 


3(n+l) )2 


2 


and so on. 

The B-function is difficult to handle in more than three dimensions, but for 
the two and three dimensional case it is manageable and we used it to obtain 
the distribution of 5 for n = 5 and m = 3. 


4. Adequacy of the s-test. Tables 1-4 provide exact tests for the values of 
m and n there given. It remains to be seen how good the ordinary 2 - test applied 
to W would be for higher values. It may be presumed that if the test is satis- 
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factory for any particular value of m and n for which, exact results are available 
it will be so for higher values of m and n. Since, for ordinary purposes the 
significance points of z as tabled by Fisher and Yates [2] would be employed 
the most useful comparison would seem to be between those tables and the 
extreme values of tables 1-4. 

For n = 3, ?n = 9, the 1% level is given approximately by S' = 78. We have 

16 128 ’ 

testing for such a value, W = 0.4814, z = 1.002, m = ~, nj = . By linear 

j 9 

interpolation of reciprocals in the tables of z we find for the 1% point and these 
degrees of freedom z — 0.954. The correspondence is hardly satisfactory, and 
the 2 test might lead to incorrect inferences in practice. Matters improve a 
good deal, however, if we make continuity corrections, by subtracting unity 
from S before calculating W, and increasing by two the divisor m^(n® - n)/12, 
so as to allow for the finite range. In this ease z = 0.979. 

For n = 4, m = 6 the 1% point is approximately ;S = 100. We have W = 
0.5556, 2 = 0 916, % = 8/3, = 40/3. By linear interpolation as before we 

find 2 = 0.888. 

Continuity corrections again materially improve the agreement, giving a 
value of 2 = 0.893. 

For n = 5 m = 3 there is no very convenient value of S close to the 1% point. 
For F = 0.015 ^ = 74 and for P = 0.0078 = 76. 

For S = 74 (with continuity corrections) 2 = 1 020 
S = 76 ( “ " “ ) 2 = 1 089 

By interpolation from the tables z = 1.075. The use of the z test would lead 
to the correct conclusion that a value of <8 equal to 74 falls below, and that of 
76 above, the 1% point. 

For values of m and n not included in Tables 1-4 it thus appears that the z- 
test with continuity corrections will give sufificiently accurate results, if n is 
greater than 3, at the 1% points. It may be presumed that the results at the 5% 
points are equally good and probably better. But for finer values of signifi- 
cance, such as 0.1%, it is doubtful whether the test is sound. The tails of the 
distribut^^on of S for moderate values of m and n are very irregular. 

For instance, the following is the tail of the distribution of SSor n = 3, m — 10 
(the total distribution being 10,077,696) : 


5 

Frequency 

S 

Frequency 

96 

11,340 

146 

740 

98 

30,090 

150 

252 

104 

13,830 

152 

420 

114 

7,380 

158 

240 

122 

4,200 

162 

90 

126 

3,240 

168 

90 

128 

1,450 

182 

20 

134 

1,860 

200 

1 
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and the following is the tail forn = 4w = 6 (the total being 7,962,624) ; 


s 

Frequency 

S 

Frequency 

-S 

Frequency 

100 

5536 

122 

4100 

146 

810 

102 

8160 

126 

4480 

148 

225 

104 

10260 

128 

240 

160 

264 

106 

8860 

130 

1152 

152 

120 

108 

3920 

132 

660 

154 

180 

no 

13344 

134 

1980 

168 

60 

114 

5460 

136 

300 

160 

36 

116 

3870 

138 

600 

162 

30 

118 

3900 

140 

312 

164 

45 

120 

2472 

144 

100 

170 

18 





180 

1 


Irregularities of this kind run all through the distributions we have obtained, 
and frequency diagrams present the same sort of features we have noticed in 
the case m = 2 (Kendall and others, [4]). The representation of such distribu- 
tions by continuous functions, no matter how close their lower moments, is 
obviously to be used with some care. Although the B-distribution or the asso- 
ciated ^-distribution will give reasonable significance tests at levels of 1% or 
greater, they will probably be inadequate to represent frequencies occurring in 
narrow ranges. 

5. Some Experimental Distributions. In some previous work we obtained a 
number of random permutations of the numbers 1-10 and 1-20. These were 
used to derive some experimental distributions of S which may be worth re- 
cording. Table 5 gives the distribution for 200 sets of pentads of 10 and 
Table 6 that for 100 triads of 20. In the distribution of Table 5, the mean of 
the grouped distribution is 404. The theoretical mean is 412.5 with a standard 
error of 12.3 In Table 6 the mean is 1936, the theoretical mean being 1995 
with s.e. 53. The distributions accord quite well with expectation. 

In conclusion we give two examples to illustrate some points of importance 
in ranking work. The first is a case in which ranks appear as the primary 
variate and in which the assumption of normality is clearly illegitimate. 

6. Example 1. In some experiments in random series a pack of ordinary 
playing cards was shuffled and the order of the 13 cards of each suit from the 
top of the pack was noted. The pack was then reshuffled and again the orders 
noted. This was done 28 times. The quc.stion wc wished to discuss was 
whether the shuffling was good, in the sense that the cards wore thoroughly 
mixed at each shufflle. 

Here, for each suit, say diamonds, vc have 28 rankings of 13. The surns of 
ranks were 183, 137, 171, 207, 188, 160, 225, 174, 216, 192, 236, 239, 220. The 
mean is 196, and S = 11522, W (without continuity corrections, which arc not 
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TABLE 5 


Experimental Distribution 6j B in 
£00 sets {m = 6, n ~ 10) 


s 

Frequency 

0- 

1 

50- 

2 


7 

160- 

9 


21 


22 


24 

350- 

26 



450- 

17 


12 

550- 

11 



650- 

4 


5 

750- 

3 

800- 

3 

1000- 

2 

1250- 

1 

Total 

200 


TABLE 6 


Experimental Distribution of S in 
100 sets (m = S, n = £0) 


-S 

Frequency 

800- 

4 " 

1000- 

8 

1200- 

8 

1400- 

6 

1600- 

12 

1800- 

15 

2000- 

20 


12 


6 


5 


0 


3 


0 


1 

Total 

100 


worth making for these values of m and n) = 0.08075, z (equation (14)) = 0.432. 
This falls just beyond the 1% point. 

Similarly for the clubs TF was found to be 0.0535; for the hearts, 0.0245; and 
for spades, 0,0342. None of these values is significant and we conclude that the 
randomisation introduced by the shuMng was good, at all events, so far as this 
test was concerned. It may be added that the shuffling was done with much 
more care than would be taken in an ordinary game of cards. 

In psychological work there has sometimes been a confusion between the 
determination of a measure of agreement between subjects and that of an ob- 
jective order based on experimental rankings. It may therefore he as well to 
point out that in its psychological applications the test of W is one of concord- 
ance between judgments. There may be quite a high measure of agreement 
about something which is incorrect. 
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7, Example 2. A number of students were given 12 photographs of persons 
unknown to them, and asked to rank them in what they judged from the photo- 
graphs to be their intelligence. For 16 students the sums of ranks were 

112, 94, 101, 84, 97, 75, 104, 84, 102, 146, 125, 124 

The mean is 104. S = 4472, W = 0.1222. z = 0.368, and is barely significant, 
being between the 1% and the 5% points 
For 111 students the sums were 

818, 670, 908, 410, 706, 526, 780, 485, 596, 1044, 959, 756 

W = 0.2378, z = 1.768 

This is, highly significant and it is to be inferred that community of judgment 
exists between students or groups of students. But there was little relationship 
between the judgments and the intelligence of the photographed subjects as 
given by the Binet Intelligence Quotient. 

Noie added in froof; 

While this paper was passing through the press Professor W, Allen Wallis, of Stanford 
University, kindly drew our attention to some unpublished work of his own on this sub- 
ject. Professor Wallis had also arrived at the coefficient W which, he points out, is the 
ranking analogue of the correlation ratio His paper is, we understand, on the point of 
publication. 
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NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


THE ALLOCATION OF SAMPLINGS AMONG SEVERAL STRATA 

By J. Stevens Stock and Lester R. Frankel 

1. Introduction. The problem of selecting a random sample so as to obtain 
optimum precision in making estimates has been the subject of inquiries by 
Bowley/ Neyman/ Sukhatme^ and others. In estimating an average value of a 
variate in a population it is often profitable to stratify the universe into several 
homogeneous parts and sample at random within each of these parts. In order 
to obtain maximum efficiency for a given size of sample it appears that the 
number of samplings from each stratum should be proportional to the standard 
deviation of the characteristic under consideration and to the total number of 
units within the stratum. By distributing the sample in such a manner optimum 
precision will be obtained in estimating a general average. 

However, it often happens that it is not the purpose of an investigation to 
study the aggregate of the universe. Evaluations and interrelations of char- 
acteristics in different groups or strata within the universe may be of importance. 
Thus, in cost-of-living surveys in a number of urban centers the object is to 
compare costs among the cities of different backgrounds. In such cases it is 
desirable for each city to have equal reliability so that each one may be treated 
as a unit. There are many other situations in the social sciences where analyses 
of this type are of importance. 

2. The Problem. In general, the sampling problem is: Given several well 
defined areas of study and a fixed number of observations with which to make 
the survey, how best to distribute the observations such that each area will be 
represented with equal precision. 

There are n observations to be distributed among m areas or strata. In the 


‘ A. L, Bowley, "Measurement of the precision attained in sampling," Bulletin de I' Insti- 
tute International de Slahstique 1926 Rome, Tome XXII, 1-ere Livraison, 3-eme partie, 
pp. 1-62 (supplement). 

’ J. Neyman, “On the two different aspects of the representative method," Journal of 
the Royal Statistical Society, 1934, pp. 568-625. 

® P, V. Sukhatme, “Contribution to the theory of the representative method," Supple- 
ment to the J ournal of the Royal Statistical Society, Vol , II, 1936, pp, 253-268, 
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j4h stratum, if iV, is the total number of units, Sj the variance of the character- 
istic to be measured, and n, the size of the sample, the sampling error of the 
arithmetic mean is 

(1) ’ y n. (N. - 


’ V n, (AT, - 1) • 

The problem then is, given iV, , numbers proportional to Sf and n, to determine 
ji, such that 


ffi = Cl 


3. First Solution. If we assume that the variances are all equal and that 
for 1V( — 1 we may substitute N , , we have 


Ni — wi _ Nt — Th 

niNi njiVa 


Af m nm 

WmAf fn 


From the total amount of money available and the cost per sampling unit we can 
determine the total number of observations to be apportioned among the m 
populations 

m 

(3) n = 2 . 


We are able then to write m equations in m unknowns: 

From (2) we may write m — 1 equations 

<*> k-k-h-i 

and from (3) we may write one equation. 

(5) ni -h rij -j- •••-!- rim = n. 

But equations (4) are not easily soluble in their present form; they can be made 
linear by writing the approximation 


1 ^ 1 
n, ~ Lj(l + <xi) 


1 — a. 


Where L,- is some reasonable approximation of n, chosen such that 

m m 

Z] L. = Z) 

1 1 

and is some small correction for L< to be determined. We have then approxi- 
mately, 

(6) 1 ^ - 1 (i = 2, 3, • ■ ■ m) 

Li Ni Li Ni 

and from equation (5) we get 

(7) CtiLl ajLs + • • • + OtfnLm “ 0- 
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If we write 

we may write (6) and (7) in the following form: 

—LiUi + Liai 

— Laai + Liaa 

( 8 ) 

— LmOCl 

LiUi -|- Laota -h Laoea -\~ 

The matrix of the coefficients is 


(9) 


= 01 
= 03 


+ LmOil = 0 n 

"H 0 


~L% 

Lx 

0 •• 

• 0 

— La 

0 

Li •• 

• 0 

— Ljn 


. . . 

• Lx 

Li 

La 

. . . 

* Lrn 


From this matrix we find that 

(10) 


ai 


— ^ ) 01 
2 


rn 

t 

and from the general form of equation (8) we have 

/■■CI’V 4>i L,Cii 

111) a, = — 

These two equations (10) and (11) give us all the «,• . It is then only necessary 
to compute the second approximations of n, by 

(12) Li — Z/,(l + ai) = n,, 

Closer approximations, though perhaps unnecessary, can be made by repeating 
the computation with the next approximations. The final approximations 
may be checked by substituting them in equations (4). 


4. Second Solution. Sometimes the numbers S! are known or at least pro- 
portionate numbers can be estimated with a fair degree of accuracy for each area, 
We shall call these proportionate number . We now have the conditions 


j.2 Ni — ni __ ^2 N 2 — Jij 
~ niNi “ ThNa 


.2 fV Trt V/m 


(13) 
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and fls before 


7f» 

l:n.= 


n . 


We may write m equations in m unknowns, a, , using the approximations Li 
as before; 

— SiZ/jai 4- SlLiajt = 6i 

— /SiLjai • + SaLiai = 03 


( 14 ) 


Where 

( 15 ) 


SlLmCCl 


+ SmLiOtm ~ Bn 


Liai + LiUi LmOim = 0 




Solving these m linear equations for a, we get 


ai == 


-tiBih/S] 

2 

m 

Si E Ll/Sl 


and from the general form of equations (14) we have 

Ox + (Si-E<<ai 


otx = 


SlL^ 


These a,- may be applied as before to the approximations Li for new approxima- 
tions Li of the numbers n,- . 

5. Remarks, (i) In either case the applications of the corrections to the 
approximation L, may be applied in two different ways : 


( 16 ) 

( 17 ) 


L[ = L.(l + a.) 


L'x = 


Lx 


1 — a,- 


When the corrections are applied according to (16) the sum of the new approxi- 
mations adds up correctly to the total n, and no further adjustment need be 
made in the L( either for repeating the operation again for nearer approxima- 
tions or for final use. If, however, the corrections are relatively large, say 


* The numbers 8l and may be used interchangeably since they are by hypothesis 
proportional. 
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greater than .10, there seems to be better convergence with the second approxi- 
mations if formula (17) is used and the resulting L[ adjuisted proportionately 
such that they add up to n. These numbers then can again be adjusted with 
new ai for final approximations. 

(ii) The numbers S? or are not always estimable. If they are not known 
at all or are known to be all nearly equal the first solution is perhaps the more 
useful. If these numbers are known, and known to be different, the second 
solution is necessary However, some saving in computation by the second 
method may be effected if the approximations X. are first adjusted by the first 
solution before being entered into the computation of the second solution. 

(iii) Further accuracy, though perhaps unnecessary, may be attained in the 
second solution by substituting throughout S'^ for S\ where 

- Ni - l^*' 

This substitution eliminates any slight inaccuracies caused by substituting 

NiioiNi - 1 . 

(iv) The initial approximations X, may in almost every case be gotten from 
the following formula: 



(v) In all that has been presented above it has been assumed that the sample 
has been drawn without replacements from a finite universe. Whether or not 
this assumption is tenable depends upon the particular object of the research. 

6. Example. In the Survey of Youth in the Labor Market conducted by the 
Division of Research in the Works Progress Administration youth who com- 
pleted the eighth grade in the school years 1928-1929, 1930-1931 and 1932-1933 
were studied. In six cities, Duluth, Denver, Birmingham, Seattle, San Fran- 
cisco, and St. Louis random samples from school records were selected. Funds 
permitted the use of 40,000 schedules. 

From school records it was possible to determine the total number of eighth 
grade graduates in each city for the years in question. The problem arose then 
as to what would be the most efficient method of distributing these 40,000 sched- 
ules among the six cities in order to compare the problems of youth. 

Assuming equal variances within cities, quotas were computed for each of the 
cities. From Table 1, summarizing the computations, it can be seen that the 
quotas fall somewhere between proportionate and equal frequencies. This last 
result would be expected if samplings had been made from infinite universes. 

7. Note. In the social sciences interest centers in deriving relationships 
among the various strata where each stratum is considered as a single unit. In 
such oases equal precision is desired. However, if the object of research is 
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TABLE 1 


City 

Sfch grade 
gradu- 
ates 

Initial 

approxi- 

mation 

FlTfit 

uorrection 

term 

First 

approxi- 

mation 

Second 

correction 

term 

Quotas 

Percent 

sampled 

Duluth, Minn. . 

6,600 

4,000 

- .02968 

3,881 

-.00077 

3,878 

70.61 

Birmingham, Ala 

9,000 

6,600 

+ .06641 

6,399- 

+ .00148 

5,343 

59.37 

Denver, Colo .... 

12,500 

6,000 

- .02690 

5,362 

- .00164 

6,409 

61 27 

Seattle, Wash. 

16,000 

6,600 

+ .07525 

6,989 

+ .00257 

7,007 

46.71 

San Francisco, Cal 

21,000 

8,000 

+ .01426 

8,114 

-.00341 

8,086 

38 60 

St, Louis, Mo 

31,000 

10,000 

- .07349 

9,266 

+ 00129 

9,277 

29.93 

Total 

94,000 

40,000 


40,000 


40,000 



simply to draw contrasts between any two strata we would seek to minimize the 
standard error of the difference, 


- \/ s ',' - i) + - 1) 

subject to the condition, 


^ Tii = n . 
1 


This leads to the result 


Uf n** 


Thus, the number of samplings from each stratum is, for all practical purposes, 
proportional to the standard deviations, irrespective of the size of the various 
strata. 


Washington, D. C. 


ON THE COEFFICIENTS OF THE EXPANSION OF X'"’ 

By J. A. Joseph 

Let us construct the following triangular arrangement of numbers: 

1 

1 1 
1 3 2 

1 6 11 6 
1 10 35 50 24 

• 4 • • ‘ • * 

1 flin — 1 ) fain ~ 1 ) ' ■ " /ns(n —I) /n_i(n. — 1 ) 

1 /l(^0 /sW .... fn-lM fnin) 
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where the n-th row can be constructed from the preceding row by means of the 
expression 

(1) n f,(n - 1) + fi+i(n - 1) = /t+i(n). 

For example, the element 35 in the middle of the 4th row is obtained from the 
two elements immediately above it, 4-6 + H = 35. (The top element is 
counted as the zeroth row.) 

The elements in the (n — l)st row are the coefficients in the expansion of 
a'"' as a function of using the notation of the calculus of finite differences, 
For example, 

= x(x ~ l)(x — 2)(x — 3) 

= - 6a;* + lla;* - 6a:. 


Of course, the signs of the coefficients alternate. 

The function /,(w) is the sum of the products of the first n integers taken i 
at a time, namely 

n 

(2) /,(n) = 2 ti €2 • • • «.• 

the summation being a symmetric function of the integers 1, 2, 3, • • , n. 
Equation (1) can be written as a linear, first order difference equation, 

A/,.n(w - 1) s/,+i(n) -f,+i(n - 1) = n-f,(n - 1) 

^ ^ /,+i(n - 1) = - 1)]. 

Since /o(n) = 1 for all values of n, we can find/i(n), and consequently /^(n), 
and so on. Thus 


(4) 


fi(n “ 1) 
Mn - 1 ) 

Mn - 1) 


A = 


n 


(21 




+ 8n 


w 


3n''’ + 8n‘ 


24 

-)] 


+ Sw'"’' + 12n 
48 


( 6 ) 




The following theorems are true for the ‘Triangle”: 

Theobem 1. The sum of the elements in any n-th row is equal to (ra 4- 1)1, 
namely, 

(5) 211 /<(«)- (n + 1)1 

4-0 

Theorem 2. The sum of the even elements of any row is equal to the sum of the 
odd elements, or 
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3) ^ (-l)7.(w) = 0. 

**-■0 

From these coefficients we can generate the Bernoulli numbers: 


Bo - Bi 


2Bo -- 3Bx + Ba = ^ 

4 

( 7 ) 6 Bo - llBi + 6 B 2 - Ba = ^ 
24Bo - 5 OB 1 + 35Ba - IOB 3 + B^ 


/„(n)Bo — fn-\in)Bi +/n- 2 («)B 2 — . . . (— l)7oB„ 


Or, as a determinant : 


(n + 1) ! 
n + 2 


0 


0 


/n(n) /„-i(n) fn-i{n) • • • /i(n) 


Bo — Bi — — I-, Ba — B 4 — Be — ' • • — Ban — 0, 

Bo = fV; Bb = — tWj • • • • 

We may now take another “triangle”: 


1 1 

1 3 1 

16 7 1 

1 10 25 15 1 


1 Fi{n — 1) Fiin — 1) Fn-iin — 1) Fn-i(n — 1) 

1 Fiin) BaCn) Fn-i{n) Fn{n) 
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where the n-th row is obtained from the preceding row by the expression 

(9) (n - i)F,{n - 1) + F.+i(n - 1) = Fi+i(n). 

For example, from the third row: 1, 6 , 7, 1, we obtain the fourth row: 1, 4 . 1 -f 5 
= 10, 3 6 + 7 = 25, 2-7 + 1 = 15, 1. The following theorem is true for 
the F,( 7 i): 

Theoeem 3 The elements in the (n — l)si row are the coefficients in the expan- 
sion of x" as a function of the factorials x^'\ 

For example: 

+ a:. 

From the generating equation (9) we can obtain, as before, the form of the 
functions Fo{n), Ft{n), • • • 

AF.+i(«. - 1) = F,+i{n) - Fi+iin — 1 ) = (n — z)F,(n - 1 ) 

^ ^ F,Li(n — 1) = A“‘[(n — i)F,(w — 1)]. 

Since Fo(n) = 1 for all n 

( 2 ) 

Fi(n - 1 ) = A-^n = -y 

F,{n - 1) = A-^[(n - 1) 

( 11 ) ^ 

Fiin - 1) = A * I (n - 2) ^ J 

_ + 4««> + 2 n'« 

48 


From these coefficients we can generate the numbers of Laplace (the numbers 
Lm below must be divided by m! to yield the numbers of Laplace) : 

Li =2 

Li + L2 =5 

Li -j- SLi -f- L3 = J 

Li + 7L2 + 6L3 + L4 = 5 

Li d- I5L2 d~ 25L3 "f" IOL4 d" Jji = 5 


( 12 ) 


Li + Fn-i{n)Li d- Fn-.i{n)Li d" • • • d" L„_i = — ^—5 

“T 1 

giving 

Li = f, L2 = — i, Ls = i, Li = — Li — |- 

A determinantal solution is also obvious. 

California Institute op Technology. 
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ON THE PROBABILITY OF ATTAINING A GIVEN STANDARD 
deviation RATIO IN AN INFINITE SERIES OF TRIALS 

By Joseph A. Greenwood and T. N. E. Grevilde 

Suppose an event with constant probability p of occurrence to be repeated an 
infinite number of timeSj and suppose the ratio of the deviation from the ex- 
pected number of successes to the standard deviation \/n^ to be recomputed 
after each trial. We are interested in the probability that this ratio will at 
some time equal or exceed some positive number k. It is not difficult to show 
that the value of this probability is unity, but as the fact has not, to our knowl- 
edge, been previously pointed out in the literature, we give the following proof. 

Let Xn denote the number of successes obtained in the first n trials, let 


tn 


Xn - np 
s/npq ’ 


and let P denote the probability that, for some n, ^ k. We shall prove that 
P = 1. To do this, let the infinite series of trials be subdivided into consecu- 
tive, mutually exclusive subseries of finite length, and let m, denote the number 

t— 1 

of trials in the f-th subseries. Let iV< = 2 w,- for i £ 2, while N\ = 0. Let k' 
be any number greater than k, and let m, be so chosen that 


( 1 ) 



for every i, 


and 

(2) V^(^' - ^ forf^2. 

It follows from (1) that 

(3) TO, S m,p -b fc' "s/ vxipq for every 

It follows from (2) that 

(4) m,p -f k' ^/viipq ^ (N, + m,)p + k-\/ (W, + m^)pq for every i. 

Let y-i denote the number of successes in the f-th subseries. It is evident from 

(4) that if 


( 5 ) 


yt ^ ntip -b fc' a/ TO, pg 


for any i, then 


iffi+mi = k. 

Hence P is at least equal to the probability that (5) holds for some i. 

Let p, denote the probability that (5) holds for a particular i. It follows 
from (3) that, for every i, pi > 0. Moreover, there exists a positive integer M 
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and a number > 0, such that if »i< M, p,^ h.^ Since there is but a finite 
number of possible values of m, less than M, there js a number po > 0 such that 
Pi ^ Pq for every i. Hence the probability that (5) holds for no value of i 
is at most 

lim (1 — po)‘ = 0 . 

I -roe 

Therefore; the probability that (5) holds for some i is unity. 

Duke Univeksity 

AND 

University op Michigan. 


’Uspensky, J. V., Inlrodudion to Mathematical Probability, p. 129. 



CONTRIBUTIONS TO THE THEORY OF STATISTICAL ESTIMATION 
AND TESTING HYPOTHESES* 

By Abeaham Wald 

1, Introduction. Let U3 consider a family of systems of n variates 
,, , X„(0***, ■ , 0^*^) depending on k parameters 0™, ■ • ■ , 0'**. 

A system of k values 0' ) • • • , 0 can be represented in the fc-dimensional 
parameter space by the point 0 with the co-ordinates 0**’, ■ • , 0'‘*. Denote 
by Q the set of all possible points 0. For any point 0 of tl we shall denote by 
P{E t io\9) the probability that the sample point E = {xi, ■ < , a:„) falls into 
the region w of the n-dimensional sample space, where x, denotes the observed 
value of the variate X,(0)(j = 1, ••• , n). The distribution P(E (wl6) is 
supposed to be known for any point 0 of In the theory of testing hypotheses 
and of statistical estimation we have to deal with problems of the following type: 
A sample point E = '( 2:1 1 • • • i a^n) of the n-dimensional sample space is given. 
We know that x, is the observed value of X,{6) but we do not know the param- 
eter point 0, and we have to draw inferences about 0 by means of the sample 
point observed. The assumption that 0 belongs to a certain subset w of is 
called a hypothesis. We shall deal in this paper with the following general prob- 
lem: Let us consider a system S of subsets of fl. Denote by Hu the hypoth- 
esis corresponding to the element w of /S, and by Ha the system of all hypotheses 
corresponding to all elements of S. We have to decide by means of the observed 
sample point E which hypothesis of the system Ha should be accepted. That is 
to say for each Hu we have to determine a region of acceptance Mu in the iv- 
dimensional sample space. The hypothesis Hu will be accepted if and only if 
the sample point E falls in the region M„. Mu and Mu' are disjoint if w u'. 
The statistical problem is the question as to how the system Ms of all regions 
Mu should be chosen. 

The problem in this formulation is very general. It contains the problems of 
testing hypotheses and of statistical estimation treated in the literature.^ For 
instance if we want to test the hypothesis Hu corresponding to a certain subset w 
of J2, the system of hypotheses Ha consists only of the two hypotheses Hu and 
Hi where to denotes the subset of Q complementary to u. If we want to estimate 
0 by a unique point, then S is the system of all points of U. In the theory of 
confidence intervals we estimate one of the parameter co-ordinates 0^*', ■ ■ • , 0**'*, 


* Research under a grant-in-aid from the Carnegie Corporation of New York. 

® See, for instance, J. Neyman, "Outline of a Theory of Statistical Estimation Based on 

the Classical Theory of Probability," Phil. Transaclions of the Royal Sooiely, London, 
Vol. 231 (1937), pp, 333-380. 
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say by an interval. In this case S is a certain system of subsets w of the 
following type: to is the set of all points 9 = (9“\ • • • , 9'*"^) for which 9 “’ lies 
in a certain interval [a, b], The problem in our formulation covers also cases 
which, as far as I know, have not yet been treated. Consider for instance 3 
subsets Ml , wj and <03 of fl such that the sum of them is equal to 0. It may be 
that we are interested only to know in which of the subsets wi , wj , us the un- 
known parameter point lies. In this case the system of hypotheses Hs consists 
only of the 3 hypotheses and . Cases like this might be of practical 

interest. 

For the determination of the “best” system (in a certain sense) of regions of 
acceptance we shall use methods and principles which are closely related to those 
of the Neyman-Pearson theory of testing hypotheses. In the Neyman-Pearson 
theory two types of error are considered. Let 9 = 9i be the hypothesis to be 
tested, where 9i denotes a certain point of the parameter space. Denote this 
hypothesis by Hi and the hypothesis 9 Bihy H. The type I error is that which 
is made by rejecting Hi when it is true. The type II error is made by accepting 
Hi when it is false. The fundamental principle in the Neyman-Pearson theory 
can be formulated as follows: among all critical regions (regions of rejection of 
Hi , i.e. regions of acceptance of S) for which the probability of type I error is 
equal to a given constant a, we have to choose that region for which the proba- 
bility of type II error is a minimum. The difficulty which arises here lies in the 
circumstance that the probability of type II error depends on the true parameter 
point 9. That is to say, if the critical region is given the probability of type II 
error will be a function of the true parameter point 9. Since we do not know the 
true parameter point 9, we want to have a critical region which minimizes the 
probability of type II error with respect to any possible alternative hypothesis 
6 ~ $2 di . If such a common best critical region exists, then the problem is 
solved. But such cases are rather exceptional. If a common best critical 
region does not exist, Neyman and Pearson consider unbiased critical regions of 
different types,® which minimize the type II error locally, that is to say with 
respect to alternative hypotheses in the neighborhood of the hypothesis con- 
sidered. In this paper we develop methods for the determination of a system of 
regions of acceptance taking in account type II errors also relative to alternative 
hypotheses not lying in the neighborhood of the hypothesis to be tested. 

2. Some Definitions. Let us denote by ft the set of all possible parameter 
points 9 and by 8 a. system of subsets of fi. If p denotes the sum of the elements 
of a subset <r of S, then we shall denote 2 Mu by , where Mu denotes the 


’ J. Neyman and E, S. Pearson; Siatislical Research Memoirs, Volumes I and II. The 
authors consider also unbiased regions of type Ai for which the probability of type II 
error with respect to every alternative hypothesis is not greater than for any other unbiased 
region of the same size. However regions of type Ai do not always exist (the existence of 
such regions has been proved for a special but important class of oases), 
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region of acceptance of and the summation is to be taken over all elements 

w of , 

definition 1. Denote by Mb and Mg two different systems of regions of 
acceptance corresponding to the same system Hs of hypotheses. The systems 
Ms and Ms are said to be equivalent if for each point 0 of fl and for every p 
ffhich is a sum of elements of S which does not contain 6, the equation 

PiE ,M',\6)= PiE e Mp 1 0) 

holds, where Mp denotes the region according to the system Mg and Mp denotes 
the region according to the system Ms . 

Definition 2. Denote by Ms and Ms two different systems of regions of 
acceptance corresponding to the same system of hypotheses. The system Ms 
is said to be absolutely better than the system if they are not equivalent and 
if for each 0 and for every p which is a sum of elements of S which does not 
contain 0 the inequality 

PiJE € M; 1 0 ) < P{E eMp\6) 

holds. 

Definition S. A system Ms oi regions of acceptance is said to be admissible 
if no absolutely better system of regions exists. 

3. The problem of the choice of Ms . The choice of Ms will in general be 
affected by the following two circumstances; 

(1) We do not attribute the same importance to each error. For instance 
the acceptance of the hypothesis that 0 lies in a certain interval I has in general 
more serious consequences if 0 is far from I than if 0 is near to I. The choice of 
Ms will in general depend on the relative importance of the different possible 
errors. 

(2) In some cases we have a priori more confidence that the true parameter 
point lies in a certain interval I than in some other cases. The choice of Ms 
will in general be affected also by this fact. Let us illustrate this by an example 
We have two coins, a new and an old one and we want to test for both coins 
whether the probability p of tossing head is equal to 5 . Let us assume that we 
make 100 tosses with each of the coins and we get head 40 times in each case. 
Since we have a priori no very great confidence that the old coin is unbiased, 
the fact that head occured only 40 times will suffice to reject the hypothesis that 
for the old coin p = \. But in the case of the new coin, having much greater a 
priori confidence that it is unbiased, we shall perhaps not reject the hypothesis 
p = I and we shall rather assume that a somewhat improbable event occurred. 
That is to say, we do not choose the same critical region in both cases due to the 
fact that our a priori confidence for p = ^ is in the case of the new coin greater 
than in the case of the old one. 

In order to study the dependence of the choice of Ms on the two circumstances 
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meutioiiod, let us introduce a weight function for the possible errors and ana 
priori probability distribution for the unknown parameter 0. The weight 
function W(0, w) is a real valued non-negative function defined for all points 9 of 
fl and for all elements to of S, which expresses the relative importance of the 
error committed by accepting when B is true. If 9 is contained in to, TF(9, w) 
is, of course, equal to zero. The question as to how the form of the weight 
function W(d, to) should be determined, is not a mathematical or statistical one. 
The statistician who wants to test certain hypotheses must first determine the 
relative importance of all possible errors which will entirely depend on the 
special purposes of his investigation. If that is done, we shall in general be 
able to give a more satisfactory answer to the question as to how the system of 
regions of acceptance should be chosen. In many cases, especially in statistical 
questions concerning industrial production, we are able to express the importance 
of an error in terms of money, that is to say, we can express the loss caused by the 
error considered in term.s of money. We shall also say that W(6, a) is the loss 
caused by accepting when 6 is true. 

The situation regarding the introduction of an a priori probability distribution 
of 9 is entirely different. First, the objection can be made against it, as Neyman 
has pointed out, that 9 is merely an unknown constant and not a variate, hence 
it makes no sense to speak of the probability distribution of 9. Second, even if 
we may assume that 9 is a variate, we have in general no possibility of determin- 
ing the distribution of 9 and any assumptions regarding this distribution are of 
hypothetical character. On account of these facts the determination of the 
system of regions of acceptance should be independent of any a priori probability 
considerations. The “best" system of regions of acceptance, which we shall 
define later, will depend only on the weight function of the errors. The reason 
why we introduce here a hypothetical probability distribution of 9 is simply 
that it proves to be useful in deducing certain theorems and in the calculation 
of the best system of regions of acceptance. 

Let us denote by/(8) a distribution function of 9. For the sake of simplicity 
let us assume that the probability density of the distribution P(JE! ew\ 9) exists 
in any point E of the sample space for any 9 and denote it hy p{E \ 9). The 
expected value of the loss is given by 

( 1 ) I=[ I W{B,(,ME\9)dmdB 

where ug denotes the element of S corresponding to E (that is to say, uig is that 
element of S for which E is & point of the region of acceptance Mug), and the 
integral is to be taken over the product of the sample space M with the param- 
eter space fi. The expected value I of the loss depends on the system Ma of 
regions of acceptance. The system Ma for which 7 becomes a minimum, can be 
regarded as the best system of regioms relative to the given weight function and 
to the given a priori distribution of 9. 

One can easily show the following: If M'a is an absolutely better system of 
regions (in sense of the definition 2) than the system Ma , then for any weight 
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function w(e, and for any a priori distribution /(0) tlic expected value F of 
the loss corresponding to is loss than the expected value I of the loss cor- 
responding to Ma . (For some execptioiml weight and a priori distribution 
functions F may be equal to I.) 

Hence we can give the following rule : We have to choose an admissible system of 
regions of acceptance. 

Now let us consider the question whether besides admissibility further restric- 
tions upon the choice of Ms can be made. In order to see this, let us consider 
two admissible systems of regions Ms and Mi which are not equivalent. One 
can easily show that there exist two weight functions w), Wi(d, oi) and two a 
priori distributions /i(0) and/sC^) such that for and /i{0) the expected 

value of the loss corresponding to Ma is less than that corresponding to Mi , and 
for Wsi&i oi) and fi(d) the expected value of the loss corresponding to Ms is greater 
than that corresponding to Ms . Hence no absolute criteria can be given as to 
which of the systems Ms and Ms should be chosen. In order to be able to make 
further restrictions upon the choice of Ms, we have to make assumptions regard- 
ing the form of the weight function. We shall deal with this question in section 6 

4, Calculation of admissible systems of regions. As we have seen, we have to 
choose an admissible system of regions The question arises as to how we can 
find admissible systems of regions. 

Provided that p(E \ 9) is continuous in E and 6 jointly, one can easily show 
that Mi is an admissible system of regions if there exists a bounded, uniformly 
continuous and everywhere positive (except if 9 is contained in w) weight func- 
tion W{6, oj) and an a priori distribution /(0) such that every open subset of fl 
has a positive probability and the expected value of the loss 

(2) /(M,) = f f W(9,u>s)p(E\9)dmdE 

Jm Oa 

becomes a minimum for Ms = Mi . (w* denotes that element of *S for which 

Mug contains E). In fact if there existed an absolutely better system Ms of 
regions, then7(Ms) would be less than /(Ms) in contradiction to our assumption 
that /(Ms) becomes a minimum for Ms = Ms . 

In order to obtain an admissible system Ms we may choose any bounded, 
uniformly continuous and everywhere positive (except if 9 is contained in w) 
weight function W{d, to) and any arbitrary a priori distribution f(d) (subject 
only to the condition that every open subset of U should have a positive proba- 
bility) and then the system Ms which makes 

/(Ms) = f f W{e, u,s)piE 1 9) df(e) dE 
Jm ■'a 

a minimum is an admissible one. In order to determine Ms wo, have only to 
determine for each E the corresponding element us of S. Let us consider the 
integral 

Is = f W(9,co)p{E\9)dfi9). 

Ja 



304 


ABRAHAM WALD 


The integral h is for a fixed E only a function of w. It is obvious that ug must 
be that element of S for which /b becomes a minimum. 

5. Admissible systems Ms and the Neyman-Pearson best critical regions. 

Let us consider the case that the system Ha of hypotheses consists only of the 
following two hypotheses: 1) 9 = where So is a certain point of Q. 2) 0 
belongs to the set complementary to Bo . Let us denote by toi the set consisting 
only of the point , and by wj the set complementary to . S consists in this 
case only of two elements ui and coi . The system M's of regions consists of two 
regions of acceptance and Mu, corresponding to the hypotheses Hu^ andff«j, 

If a common best critical region in the sense of Neyman-Pearson exists and if 
Afs is admissible, then Mu, is obviously a common best critical region. This 
leads to the following remarkable conclusion: If a common best critical region 
exists and if the system Ms of regions consisting of the two regions and 
Mu, minimizes the expectation of the loss (formula 2) for a weight function and 
for an a priori distribution subject to some weak conditions mentioned in para- 
graph 4, then Mu, is a common best critical region. That is to say, the form 
of the weight function and of the a priori distribution affects only the size of the 
region Mu, but it will always be a common best critical regiori, 

6. The choice of Ms if a weight function is given. We shall now consider the 
case in which a weight function W(0, w) is given and we shall deal with the ques- 
tion as to how Ms in this case is to be chosen. 

If the parameter point is an unknown constant and if B denotes the true 
parameter point, then the expected value of the loss is given by 

(3) r{6) = f WiB,o>a)p(E\B)dE 

where the integration is to be taken over the whole sample space M and Hug 
denotes the hypothesis accepted if E is the observed sample point. That is 
to say wb is that element of 5 for which E is contained in the region of acceptance 
Mug ■ We shall call the expression (3) the risk of accepting a false hypothesis 
if B is the true parameter point. Since we do not know the true parameter 
point B, we shall have to study the risk r{8) as a function of 8. We shall call this 
function the risk function. The form of the risk function depends on the 
system Ms of regions and on the form of the weight function. In order to 
express this fact, we shall denote the risk function corresponding to the system 
Ms and to the weight function W{B, w) also by 

r{B\Ms,W{B,o>)\. 

Definition 4- Denote by Ms and Ms two systems of regions of acceptance 
corresponding to the same system Hs of hypotheses. We shall say that Ms 
and Ms are equivalent relative to the weight function W (6, co) if the risk function 
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r[S I Ms , W{e, w)] is identically equal to the risk function r\d j Ms , co)], 

that is to say if for each point 0, 

r[e 1 Ms , w(e, w)] = r[e \ Ms , W{e, t.)], 

Befimtion 5. Denote by Ms and M's two systems of regions corresponding 
to the same system Ha of hypotheses. We shall say that Ms is uniformly 
better than M's relative to the weight function W{6, u) if Ms and Mg are not 
equivalent and for each 0 

r(0 1 Ms , W(0, «)] < rl0 I Ms , W(0, w)]. 

Definition 6. A system Ms of regions of acceptance is said to be admissible 
relative to the weight function Wi6, co) if no uniformly better system of regions 
exists relative to the weight function considered. 

It is obvious that we have to choose a system Ms of regions which is admissible 
relative to the weight function considered. 

There exist in general many systems Ms which are admissible relative to the 
weight function given The question arises as to how can we distinguish among 
them, Denote by ru^ the maximum of the risk function corresponding to the 
system Ms of regions and to the given weight function. If we do not take into 
consideration a priori probabilities of 0, then it seems reasonable to choose that 
system Ms for which rug becomes a minimum. We shall see in section 8 that 
the system Mg for which rug becomes a minimum has some important properties 
which justify the distinction of this particular system of regions among all 
admissible systems, 

Definition 7. We shall call an admissible system M's of regions for which 
Tus becomes a minimum a best system of regions of acceptance relative to the 
weight function given.'* 

Now we shall have to deal with the question of determining a best system Ms 
of regions and what special properties this system Ms has. 

7. Reduction of the problem to the case when the system Haot hypotheses is 
the system of all simple hypotheses. A hypothesis is said to be a simple 
hypothesis if w contains exactly one point of the parameter space fi. We assume 
that each element w of S is a closed subset of 12. Hence the power of S is not 
greater than the power of the continuum and therefore we can always set up a 
correspondence between the elements « of N and the points of 12 such that to 
each point B corre.sponds a certain element wo of S and to each element w of N 
at least one point 6 oxi.sLs for which wj = to. For instance if S con-sists of the 
two elements wi and ui then we can set up a correspondence as follows: the 
element of S corresponding to 6 i.s coi if 6 is contained in coi and W 2 otherwise. 


' As we shall see later (Theorem 3), the best system of regions is uniquely determined if 
some regularity conditions are fulfilled. 
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If a is one dimensional and S is the system of all intervals of a certain length e 
then, we can define the interval we corresponding to 9 as the interval of which 
the initial point is 6 and the terminal point 6 -{■ e. 

Let us denote the weight function by W{6, co) defined for all values of 6 and 
for all elements w of S. Consider the system Hs of all simple hypotheses and 
the following weight function 

(4) W{d, 6) = W{d, wj) 

where 6 denotes the true parameter point and 6 denotes the estimated point. 
A system Mg of regions of acceptance for //§ is given by a vector function d(E) 
of the observations such that to each point E = (xi , • • ,x^) of the sample 
space M corresponds a certain point d{E) of the jiarameter space. For each 
point 9(1 the region of the acceptance of the hypotho,si.s 9 = da is given by 
the equation d(E) = 5a . Wc shall call the function d(E) an estimate of 5, 
the system of regions Ms is uniquely determined by the estimate Wc shall 
call d{E) a best estimate relative to a given weight function if the .system of 
region, s determined by 6{E) is a best system of regions relative to the weight 
function coasidered. 

Let us denote by ^E) a best estimate of 9 relative to the weight function 
W(9, 9) defined in (4), A best .system Ms of regions of acceptance in the original 
problem can obviously be obtained in the following way: Denote by w an element 
of S. The region of acceptance of the hypothesis H,^ consists of the points E 
for which 


U)j(s) — to. 

Hence we can re.9trict our considerations to the ea,so when the system of hypoth- 
eses is the system of all simple hypotheses. We shall deal with the problem of 
how a best estimate of 9 can be found and what properties this estimate has. 

8 Some theorems concerning the best estimate. In order to study the 
properties of a be,st estimate. 9(E) it is asefiil to con.sider hypothetical a priori 
distributions of 9. Wo shall e.spccially consider point distributions of 9, that 
is to say, distributions where a finite number of point.s 9i, ■ , of the param- 
eter space fl exist such that the probability of any .sub.set of Q not containing 
any of the points 5i , • • ,9, is zero. If 9i ,■■■, 9^ are given, a point di.stribu- 
tion is characterized by a vector p = (f>i , • • • , Pt) where pi denotes the proba- 
bility of 9i and Sp, = 1. 

If 9(E) denotes an estimate of 9 and if f(9) denotes a distribution function of 9 
then the expected value of the loss, that is to say the expected value of the weight 
function W[5, 9(E)J is obviously given by 

(6) f f Wl9,9(E)]p(E\9)df(9)dE 

-'M JSl 

where p(E 1 6) denote, s the probability density in S if 5 is the true parameter 
point and the integration is to be taken over the product of the sample space M 
and parameter space fl. 
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Let us assunie that for every sample point E there exists a parameter point 
Sf{E) such that the expression 

( 6 ) J^W{e,e)p{E\d)dm 

becomes a minimum with respect to 6 tor 6 = B/iE). We shall call the estimate 
Sf{E) a minimum risk estimate with respect to the distribution f{e), since also 
the expression (5) becomes a minimum for the estimate d/(E). 

We shall make the following assumptions: 

Assumption 1. The parameter space is a bounded and closed subset of the 
jt-dimensional Euclidean space. 

Assumption 2. The weight function W{d, 6) is continuous in 6 and 0 jointly. 
Assumption 3. The probability density p(E \ 9) is continuous in E and 9 
jointly, That is to say if lim E^ = E and lim 9, = 9 then lim p(_E, \ 9,) = 
f(E 1 9). 

Assumption 4- For any distribution /(e) of 6 there exists at most one minimum 
risk estimate 6/{E).^ 

Assumption 5. If f{9) and f{e) denote two different point distributions of 9 
and if 9;{E) and 9f{E) are minimum risk estimates corresponding to/(fi) and 
f{i) respectively, then Bs{E) is not identically equal to 
The assumptions 1-5, with addition of an assumption 6 which we shall formu- 
late later, enables us to deduce important properties of the best estimate 9{E). 
First we shall prove some Lemmas by means of the assumptions 1-5. 

Lemma 1. For any a priori distribution f (9) of 9 there exists exactly one mini- 
mum risk estimate 0/iE). 

According to Assumption 2 W{9, 9) is continuous. Since the parameter 
.space is compact on account of Assumption \,W{9, 9) is uniformly continuous. 
According to Assumption 3 p(E ] &) is continuous; hence for any fixed sample 
point E, p{E I 6) is bounded. From these facts it follows easily that the expres- 
sion (6) is a continuous function of 9 for any fixed sample point E. Hence there 
exists at least one parameter point 9f{E) such that (6) becomes a minimum for 
i = 9f(E). Since, according to Assumption 4, at most one parameter point 
exists for which (6) becomes a minimum. Lemma 1 is proved. 

If a distribution f{6) of 9 is given then the distribution of each of the com- 
ponents of 9 can be found. Denote by Q, the set of real numbers 

which are discontinuities of the distribution of the component — I, ■■■, k) 
and form the set Q = Qi -f • ■ • -J- Q*, . As is w'cll known, Q is at most denumer- 
able. A fc-dimensional interval J of the parameter space given by 

a, < 9^^ <b, (/ = 1, . . . , fc) 

is called a continuity interval of the distribution /(6) if no a, and no b, belongs to 
Q. A sequence IfniO)] of distributions is said to be convergent towards the 


‘ As will be shown in Section 10, Assumption 4 is not as restrictive as it would appear. 
It will be satisfied in the great majority of practical cases. 



308 


AfiRAtiAM WALD 


distribution f(9), l.e. in symbols lim /„(«) = f(6), if for any continuity interval 
J of f{6) the probability of J corresponding to the distribution /„( 0 ) converges 
■with increasing n towards the probability of J corresponding to the distri- 
bution /(0). 

Ljdmma 2. 7/ {/„(0)} (tt = 1, . . • , ad inf.) denotes a sequence of disiribuUons, 
then there exists a subsequence l/n„(5)} {tn ~ 1, • • ■ , ad inf.) which converges 
towards a diairihuiion. 

As is well known, there exists a completely additive set function P(w) defined 
for all Borel measurable subsets w of ft and a subsequence {nm} of {n}, such 
that for any continuity interval J of P{ui} the probability of J corresponding 
to the distribution /„„,(0) converges with increasing m towards P{J). Since fi is 
bounded, there exists a continuity interval J such that for all n the probability 
of J according to/«(d) is equal to 1. Hence P(ft) = 1, that is to say, P{u) is a 
probability set function which proves Lemma 2. 

Lemma 3. 7/ j/„(0)} (n — 1, • • • , ad inf.) denotes a sequence of distributions 
which converges towards the distribution f (6) and if lim E-n — E then 

lim dfSEr) = Of{E), 

TlnOO 

where 6/„{E) denotes the minimum risk estimate corresponding to fn{9) and d/(E} 
denotes the minimum risk estimate corresponding iof(6). 

If {(Sn(^)l denotes a sequence of real valued functions which converges uni- 
formly towards a continuous function then 

(7) lim / <pMdfniO) = f <p(e)df(9). 

Since iv>n(®)} converges uniformly towards (7) is obviously true if 
lim [ ip{d)dfjfi) = I (fi{9)df(e) 

holds. The latter equality follows easily from the fact that ft is compact. 
Consider a subsequence \nm} of jn) such that lim (Pn„) exists. Denote 

this limit by 6*. In order to prove Lemma 3, we have only to show that 9* = 
0f{E). If 6f{E) 9 ^ 6* then on account of Assumption 4 

(8) [ W[d, 6AE)]p{E 1 6) dm < f W{e, 9*)p{E 1 9) df. 

W (9, 9) is uniformly continuous since ft is compact. On account of Assumption 
3 also p{E 1 6) is uniformly continuous in the product of ft with a bounded sub.set 
of the sample space, Hence 

me, 1 

converges uniformly in 9 towards 

TT(e, 9*)p(E 1 9) 
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and we have on account of (7) and (8) 

Urn f wle, ef„„jE,,J]p(E.Je)dfn^ = f W(e, d*)p(E\e)df 

jjltaOO wJl 

> f W[d,0AE)]p(El6)df, 

*f{i 


(9) 

and 

( 10 ) 


lim [ W[e,dAE)]p{E\e)dfn„=^ ( W[e,6f{E)]p{E\8)df. 


From (9) and (10) it follows that there exists a positive S such that for sufficiently 
large m 

f W[d, iE^J]p(E„^ I e) dfn^ > f W{8, 8AE)]p(E 1 8) + S. 

•>0 •'!) 

Since the sequence of functions (p(En | fl)} converges uniformly in 8 towards 
p(E 1 8), we have for sufficiently large m 

/ w[e, {Enj]p{E^„, 1 8) dU^ > f w[e, eAEME„„ \ 9) df„^ 

Ja Ja 


But this is a contradiction, since 9fAE) is a minimum risk estimate. Hence the 
assumption 6* ^ 6/(.E) is proved to be an absurdity. This proves Lemma 3. 

Lemma 4. To each positive e a bounded and closed subset M, of the sample space 
M can be given such that 

f p{E 1 8) dE > 1 — e 

Jmi 

for every point 8 of the parameter space 0. 

Let us assume that Lemma 4 is not true and we shall deduce a contradiction. 
Denote by My{v = 1, 2, ■ • • , ad inf.) the sphere in the sample space M whose 
center is the origin and whose radius is equal to v. Since Lemma 4 is supposed 
to be not true, to each v there exists a parameter point 6, such that 


(11) / p{E\ey) dE < 1 - t ()/ = 1, . • • , ad inf,). 

JUy 

Since Q is compact, there exists a subsequence { 9 ,^ ) of the sequence { 6 , ) such that 
lim exists. Denote lim 6y by 8. Since 

f p(Ele)dE = 1 

there exists a positive integer v' such that 

p(El8)dE> 1-i. 
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On account of Assumption 3 we get easily 

lim f p{E 1 6, ) dE - I p{E j 0) dE. 

II— » JUfi J M,i 

Hence for surliciently large p we get 

f p(El0,JdE>f p(E\0jdE > l~e, 


in contradiction to (1 1 ) . This proves Lemma 4. 
For any estimate 0(E) we shall call the integral 


r(0) = [ W[0,0iE)]p(E\0)dE 


the risk function of the estimate 0{E). The value of the risk function r(0) is 
for any 0 equal to the expected value of the loss (of the weight function) if 6 is 
the true parameter point. 

Lemma 5. To any positive ij a positive S can he given such that for any estimate 
0(E) and for any pair 0, 0' of parameter points whose Euclidean distance is less 
than 5 the inequality 


r(0) - r(0') \ = f W[0, 0(E)]p(E \d)dE- f W\0', 0(E)]p(E 1 0') dE 

^ it 


<n 


holds. 

Since W(0, 0) is uniformly continuous, to any e > 0 a positive 5 can be given 
such that for any pair of points 0, 0' whose Euclidean distance is less than S 
the relation 


(12) I W(0, 0) - W{0', e) I < t 

holds for every 0. On account of Assumption 3 5 can be chosen in such a way 
that also the inequality 

(13) I p(E I 0) - p(E \0')\< e 

is satisfied for any sample point E of & bounded subset M' of M and for any 
pair 0, 0' whose Euclidean distance is less than 5. 

Since W(0, 0) is continuous and fi is compact, W(0, 0) must be bounded. 
Denote by A an upper bound of W(0, i). According to Lemma 4 there exists a 
bounded and closed subset M' of the sample space M such that 


f p(E \0)dE 

It is obvious that 


_n_ 

2A 


for any 0. 


I m0,0(E)]p(E\e)dE- f Wl0',0(E)]p(E\0')dE 
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In order to prove Lemma 5 we have only to show that 

(14) 1 W[9, emp(E \e)dE- W[6', e{E)]piE ] o') 1 < ^ . 

On account of (12) and (13), (14) is certainly true for sufficiently small e. 
Hence Lemma 5 is proved. 

Lemma 6. If the sequence {/n(0)) of distributions converges towards the distn- 
hution fie) and if r/,Xd) denotes the risk function of the minimum risk estimate 
edE) converges uniformly towards the risk function r/iO) of the 

minimum risk estimate 6j(E). 

According to Lemma 4 to any positive e a bounded and closed subset M, 
of M can be given such that 

( 15 ) / p(E{e)dE>l~e 

for every 6. From Lemma 3 it follows easily that {0/„(i/)} converges uni- 
formly towards GjiE) in ■ Hence 

lim f W[6, efSE)]piE 1 0) dJS? = [ W[e, d,iE)]piE 1 e) dE 

holds for every 0 and for every positive «. Since W{e, 6) is bounded and « can 
be chosen arbitrarily small, we get on account of (15) that 

lim f W[e, ef„{E)]piE \e)dE = f W[e, BfiE)]piE 1 e) dE, 

n-oo Jjd 

that is to say 

lira rfjfi) = r/(fl). 

The uniformity of the convergence follows easily from Lemma 5. 

In the following argument we shall consider an arbitrary but fixed system of s 
parameter points di , ■ ■ ■ , B, , and point distributions such that no point 
e Bi. , • ■ ■ , B, has positive probability. Such a point distribution is charac- 
terized by a vector p = (pi , ■ • • , pi) where p, denotes the probability of 6, 
[i = 1, • • • , s) and 2/). = 1. The points Bi , ■ ■ ■ , B, are kept constant and 
only p will vary. Hence if we speak about different distributions p = 
(?!)••• ) Pi), p' = (pi , • ■ ■ , p',) they are always related to the same points 
Bi, ,6, unless we state explicitly the contrary. 

Lemma 7. If p — {pi, , p>) and p' = {pi + Api , • • ■ , p, + Ap.) denote 
two different distributions then 

2 ~ l)pi + ^Api] [r,(p0 — rj(p)] < 0 
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holds for my posiiwe X, where 

r.(p) = W[di , 8,{E)]piE \e.)dE (i = 1, . . . , s), 

n(pO = [ Wi0^,8AE)]piEle^)dE, 

and dp{E) and 9p‘{E) denote the minimum risk estimates corresponding to p and p' 
respectively. 

We have 

E (p. + Ap.)?-,(p) = f L WK e,m]p'ME\ed ds = a 

i i 

and 

E (p,. + Ap.)r.(pO = f E W[$., ]p:p(S 1 0.) = Ja. 

1 

Since dfi{E) is the minimum risk estimate corresponding to p', we have h^h. 
We shall show that I\> h. According to Assumption 5 6p{E) is not identically 
equal to 8f'{E). Hence there exists a point E' such that 9f,{E') 6pi{E'). On 

account of Assumption 4 

2W[0. , e,{E')]p:piE' 1 0.) > 2W[0. , eAE')]p',p(E' 1 e,). 

From Lemma 3 it follows that df{E) and B^'lE) are continuous functions of E, 
Hence there exists a positive 8 and a sphere s with center in E' such that 

2W[e. , e,mME 1 <1.) > SW[0. , e,,{E)]p[piE 1 0.) + « 

for every point E of S. Since 6p'{E) is the minimum risk estimate corre- 
sponding to p' we have 

2F[^. , dpmp'piE 1 (?.•) > 2F[9. , ep,iE)]p:p{E I e.) 
for every point E outside S. Hence h > h that is to say 

(16) 2(p. + Apv)r.(p) > 2(p. + Ap.)r.(p'). 

Analogously we get 

(17) 2p,r.(p) < 2p.r,(p'). 

Multiplying (16) by an arbitrary positive value X and subtracting (17) we get 
2[X(p* + Ap.) — p,]r<(p) > 2[X(pi -f Ap.) — p,]r,(p'), 

Hence 

2[(X - l)p( + XAp..][r.(p') - r,(p)] < 0. 

Let us denote for any p the maximum of the numbers 
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by i'(p)‘ shall call a distribution p for which r(p) becomes a minimum, a 
risk-minicftizing distribution. We shall say that the risk-minimizing distribu- 
tion p = (pi ) ■ ■ ■ ) P*) is not degenerate if pi > 0, ■ • • , p, > 0. Otherwise we 
shall say that p is degenerate. 

8. There exists at least one risk-minimizing distribution p. 

From Lemma 6 it follows that ri(p), • • • , r,(p) are continuous functions of p. 
Hence also r(p) is continuous Since the set of all possible distributions p is 
bounded and closed, there must be at least one distribution p for which r(p) 
becomes a minimum. 

Lemma 9. If p ~ (pi ,••• , Pi) denotes a risk-minimizing distribution which is 
not degenerate then 

nip) = ri(p) = ...=: r.(p). 

Let us assume that there are two integers i and j, for instance 1 and 2, such 
that flip) < rs(p). We shall deduce a contradiction from this assumption. Let 
us consider two different distributions p' = (p( , • • • , p',) and p" = (p”, ■ • , p,') 
where p" > 0. Hence at least one of the quantities 

(pi - p"), • • , (pi - pi') 

is unequal to zero. Since 2p( = Sp'/ = 1, also at least one of the quantities 

(pi - P^), •■■,(pi- p") 

must be unequal to zero. On account of Lemma 7 we have 

i: [(X - i)p: + \(p': - pi)]w) - r.cpoi < o. 

f 

If we put X = we get 
Pi 

i: r (4 - 1 ) p.' + 4 (p" - poI w) - n(pO] < 0 . 

.-s L\pi / pi J 

Hence at least one of the quantities 

ri(p") - TsCpO, • • , r.(p") - r.(p') 
must be unequal to zero. 

Since pi > 0, there exists a closed sphere S, with center at p such that for any 
point p' of Sp p'l > 0. Hence for any two different points p' and p" of Sp at 
least one of the quantities 

r.(p") - r,(p'), ■■■, r.(pn - r.(pO 

is unequal to zero. Denote by Sp the projection of Sp on the s — 1 dimensional 
space given by pi = 0. Consider the transformation according to which the 
image of the point p' — (pi , , pi) of Bp is the point q(p') = [r 2 (p')i ■ • ■ > 

?’i(p')]. It is obvious that the images of two different points of are different. 
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Since r,(p) (i = 1, ■ • • ,a) is continuous, the transformation is continuous and 
therefore topological. Denote the image of S, by Rp . Since p = (ps , • • • , p,) 
is an interior point of Sp , according to the Brouwer-Jordan theorem® on domain 
invariance the image q(p) = [raCp), • ■ • , r.(p)] of p must also be an interior 
point of Rp . Hence for sufficiently small « > 0 the point 

i(e) = [r 2 (p) -«,•••, r,(p) ~ «] 

is contained in Rp . Denote by p(«) = [pzfe), - • • , pi(e)] the point of Sp whose 
image is t(f). It is obvious that 

(18) lim p(e) = p = (p 2 , • • • , p.). 

<-0 

Consider the point p(«) of Sp whose projection is p(e) that is to say p(t) has the 
co-ordinates 1 — 2pi(t), pi(t), , p,{t). From (18) it follows that also 

(19) lim p(«) = p = (pi, pj, • • • , p.). 

<>o 

Since ri[p(«)|, • • . , n[p(«)] are continuous functions of t and since ri(p) < rj(p), 
for .sufficiently small < the maximum of the numbers 

n[p(«)], rs[p(«)] = ri(p) - <,•••, r.[p(«)] = r,(p) - e 

is certainly smaller than the maximum r(p) of the numbers 

n(p), • • • , r,(p), 

in contradiction to our assumption that p is a risk minimizing distribution. 
Hence the assumption ri{p) < rifp) is proved to be an absurdity and Lemma 9 
is proved. 

In the previous arguments we have considered an arbitrary but fixed system 
of s parameter points Bi , ■ ■ ■ , d, and all distributions p were related to these 
points. In the following arguments we shall vary the points 6i , ■ ■ ■ , 6, and 
therefore we shall have to state the parameter points to which the distribution p 
is related. 

Let us consider a sequence {tf,j (p = 1, ■ • • , ad inf.) of parameter points 
which is dense in 12. We say that a subset w of 12 is dense in Q if for each point 6 
of 12 any arbitrarily small open neighborhood of d contains at least one point of w 
Since 12 is compact, a sequence {fl,} which is dense in 12 certainly exists. Let us 
consider the first s points Bi , ■ ■ ■ , 6, of the sequence According to 

Lemma 8 there exists for any s a risk-minimizing distribution p(s) - [pi(s), 
• . • , p,(s)] related to Si , • • , 0, . 

Assumption 6. There exists a sequence {0,} (s = 1, • • ■ , ad inf.) of parame- 
ter points which is dense in 12 and .such that for almost any s’’ the risk-minimizing 

® See for instance Alexandroff and Hopf, Topologie, Berlin 1935, p. 396 
’ By “almost any s” we understand “for all s greater than a sufficiently large integer," 
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distribution p(s) = [pi(s)) ■ ■ ■ , p»(s)] related to the first 8 points fli , ■ • • , 0. , 
is not degenerate. 

Lemma 10. Denote by {0,j (s = 1, 2, . • ■ , ad inf.) a sequence of parameter 
points for which the conditions of Assumption 6 are fulfilled. Denote by p(s) = 
[pi(s), • • ■ , p.(s)] the risk-minimizing distribution related to the first s points 
Si, ,6,. Then there exists a notirnegative constant c such that for any arbi- 
trarily small positive « the inequality 

c-€< f W[di 0,<.)(J?)]p(.B 1 0) < c + € 

holds identically in 6 for almost every s. That is to say the risk function of the 
minimum risk estimate 6^^,) (E) lies entirely between c — € and c + t for almost 
every s. 

Denote the risk function 

f W[d,ei,MiE)]p(E\e)dE 

of the estimate 9/,<.,){E) by r(6, s). First we shall prove that there exists a 
sequence {c,} (s = 1, • • • , ad inf.) of non-negative numbers such that for every 
« > 0 the inequality 

(20) c, — t < r{0, s) < c, + « 

holds for almost every s. In fact to any positive y a positive integer s, can be 
given such that for any s > s, the points 0i , • ■ • , 0, are »j-dense in fi. That is 
to say every point 0 of 12 lies in a sphere with radius y and center in one of the 
points 01 , • • ■ , 01 . Since for sufficiently large s p(s) is not degenerate, we have 
on account of Lemma 9 for sufficiently large s 

(21) r(0i , s) = . . . = r{0, , s) = c. . 

Since for sufficiently large s 0i , • • • , 0, is ij-dense in 12, we get easily from 

Lemma 6 that (20) holds for any positive « for almost every s. 

In order to prove Lemma 10 we have only to show that Urn c, exists and is 

•■■06 

finite. First we see that for no estimate &{E) can the corresponding risk function 
r(0) = f W[e,O(E)]p(E\0)dE 

J Af 

lie entirely below r(0, s) that is to say 

(22) r(0) < r(0, s) 

cannot hold for any 0. In fact if (22) were true for a certain estimate 0(F) then 
2p((s)r(0O = f SIf [0. , 0(F)1p,(s)p(F [ 0.) dE < 2p.(s)r(0, , s) 

, = f STf[0i,0,w(F)]p.(8)p(Fl0i)dF, 
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which is not possible since 6p(,,)iE) is a minimum risk estimate. Hence (22) 
cannot hold for any d. From this fact follows easily that lim c, exists and is 
finite. This proves Lemma 10. 

Lemma 11. Denote f(6) a distribution of 6 and let 8/{B) he the corresponding 
minimum risk estimate. If 6{E) denotes an arbitrary estimate then 

r(e) s rf{e) 

if df{E) ^ d{E) only in a set of measure 0, and 

[ r{8) dfi8) > [ rt{8) df{8) 

if 8f{E) 7^ d(E) in a set of positive measure. r(ff) denotes the risk function of 
8(E) and rf(8) denotes the risk function of 8{(E). 

If ej{E) 9^ d(E) only in a set of measure zero, then we have obviously r(0) s 
rf(d). Consider the ease that d/(E) 9(E) in a set M' of positive measure. 

According to Assumption 4 we have 

f W[8, 8(E)\p(E 1 8) dm > f W[6, d,(E)] p(E 1 9) df(8) 

Jq Jq 

for any point E of M'. Since 

I W [8, 8(E)] p(E I 8) df(9) = I W [8, 8AE)] p(E 1 8) df(8) 

Jo Jo 

for any other point E of the sample space M, we get 

f r(8)df= f [ Wl8,8(E)]p(El8) dfdE 

Jo Jm Ja 

> f f W 18, 8AE)] p(B I 8) dfdE = f rA8) df 
Jm Ja Ja 

Hence Lemma 11 is proved. 

We are now able to prove some theorems about the best estimate 6 (E) relative 
to a given weight function. An estimate 9(E) is a best estimate according to 
our definition 7, if the maximum of the risk function of d(E) is less than or equal 
to the maximum of the risk function of any other estimate 9(E) and if 0(F) is an 
admissible estimate (that is to say there exists no estimate d(E) such that the 
risk function r(8) of 8(E) is not identical to the risk function f(d) of 6(E) and 
in every point 6 f(8) > r(d). 

Theorem 1. If d(E) is a best estimate and if the Assumptions 1-6 are fulfilled 
then the risk function f(8) of 9(E) is constant, that is to say 

f(9) = c. 

According to Assumption 6 there exists a sequence { 0,} (« = 1; • • ' j 
of parameter points such that {0,1 is dense in 0 and for almost every s the risk- 



STATISTICAIi ESTIMATION 


317 


minimizing distribution p(s) related to Bi, 6, is not degenerate. On 
account of Lemma 10 there exists a non-negative constant o such that for any 
t > 0 the inequality 

(23) c - « < r{e, s) < c + e 

holds for almost every s. r{d, s) denotes the risk function of the estimate 
According to Lemma 2 there exists a subsequence {s„} (n = 1, • . . , 
ad inf.) of integers such that the sequence {p(s„)! of distributions converges 
towards a distribution f(d). From Lemma 6 it follows that 

lim r(e, Sn) = TfCe) 

n— 06 

where r/(d) denotes the risk function of the minimum risk estimate B/(B). On 
account of (23) we have 

r /( d ) s e . 

From Lemma 11 it follows that for any other estimate B(E) either 

r(S) m r/(B) s c 
or 

f r(d) df > f r/(B) df , 

Jsi •'n 

where r(9) denotes the risk function of 0(B). In the latter case there exists at 
least one point 6 for which r(d) > rf(6). Hence d/(B) is a best estimate. If 
e{E) is also a best estimate, we get on account of Lemma 11 that e(E) can 
differ from 0/iB) only in a set of measure 0 and the risk function of 8(E) is 
identically equal to c. Hence we have proved Theorem 1 and also the following 
Theorems 2-3: 

Theobem 2. If the Assumptions 1-6 are fulfilled there exists a distribution 
f(8) of 6 such that the corresponding minimum risk estimate 6j(E) is a best estimate. 

Theorem 3. If Assumptions 1-6 are fulfilled and 8(E), d*(E) are best es- 
timates, then d(E) — d*(E) almost everywhere and the corresponding risk functions 
are identically equal. 

Now we shall prove (without making the Assumptions 1-6) 

Theorem 4. If W(6, 0) and p(E \ 6) are continuous and is compact, and if 
J{d) denotes a distribution of 6 such that any open set has a positive probability, 
then the minimum risk estimate 8j(E) is a best estimate if its risk function rf(6) 
is identically equal to a constant. 

Let r/(d) be identically equal to c and consider an arbitrary estimate d(E). 
Since W(d, 6) and p(E\ 6) are continuous and is compact, the risk function 
r(9) of d(E) is a continuous function of 0. Since 6/(E) is a minimum risk 
estimate we have 


f r(0) df > [ r/(0) df = c. 
Jn Jn 


( 24 ) 
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In order to prove Theorem 4, we have to show that either 

(25) r(d) = c 
or there exists a point 6' such that 

(26) r{e') > c. 

If (25) does not hold there exists a point d* such that r(6*) c. If r{d*) > c 
our statement is proved. Consider the case r(9*) < c. On account of the 
continuity of r{0) there exists a positive h and an open neighborhood U of 9* 
such that 

rie) < c - S 

for every 6 in U. Since / df is assumed to be positive, the inequality (24) 

Jv 

can hold only if there exists at least a point 6' for which r(9') > c. This proves 
Theorem 4. 

9. Determination of the best estimate eiE) for a certain class of distributions 

p{E 1 9). In this paragraph we shall prove two theorems which enable us to 
calculate very easily the best estimate diE) for a certain special but important 
class of distributions. 

The risk function of an estimate d{E) is given by 

r(9) - f W[9,S(E0]p(E\9) dE, 

Jm 

where the integration is to be taken- over the whole sample space M. We con- 
sider the integral equation 

(27) f W[9,9(E)]p(El9) dE ^ c, 

Ju 

where c denotes an arbitrary constant. If we can find an estimate 9(.E) which 
satisfies (27) for a certain c and which is an admissible estimate relative to the 
weight function considered, then 9(E) is certainly a best estimate. If Assump- 
tions 1-6 are fulfilled, an admissible estimate satisfying (27) certainly exists. 
As we shall see, a best estimate can very easily be determined by the above 
procedure if the conditions in the following theorem 5 are fulfilled. 

Theobbm 6. Let us assume that the following conditions are fulfilled: 

I. The parameter space £2 is one dimensional and 9 can take any real value from 
— 00 to -b 00 . 

II. The probability density p(E | 9) depends only on the differences - 9, 

< • ‘ ,Xn — 9, that is to say p(E | 9) = p(xi — 9, • • ■ , — 9), where Xi , ■ ■ • , Xn 

denote the co-ordinates of E. 

III. The value of the weight function depends only on the difference u = 9 — 9 
and is uniformly continuous m u. 
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IV. For any value 6 and for any sample point E the integral 


(28) 




wo - f)p(Ele) 


do 


Jm a value. 

V, For every E there exists a finite value 0'{E) such that <p(8, E) becomes a mini- 
mum for 6 = 9'(E). 

Then there exists an estimate 6(E) suck that for any E, ^(0, E) becomes a 
minimum for 0 =i{E) and 6(E”) — e(E') ~ \ for any E' = (x'l, ■ • ■ , x'^) and 
E” = I ®n) which Xi — x[ — ■ ■ • — x'n — x'n = X. An estimate with 

these properties is a best estimate. 

Let us consider two sample points E' = (x'l , • ■ • , x'„) and E" = (x'l, ■ ■ • , x'i) 
such that x’l — x[ = ■■■ = x'n — x’„ = \. From the conditions 11 and III 
follows that if \l'(6, E') becomes a minimum for 6 = di, then E") becomes 
a minimum for 0 = flj = fli + X. Hence there exists an estimate 0(E) = 
iixi , ■ ■ ■ , Xn) such that for any E, yj/(6, E) becomes a minimum for 6 = 0(E) 
and 0(E") — 6(E') = \ U xl — xi = ■■■ = x'i — x'n = h. We shall show 
that such an estimate 6(B) is a best estimate. First we shall show that the 
risk function 


f +eo 

... / W[0 — i{E)] ~ , Xn — O') dxi • • • dXn 

00 •'—oo 


is constant. Let us consider two arbitrary parameter values O' and 0". Then 
we have 


• ■ ■ / W[e' — 9(E)] p(xi — 9', ,Xn — O') dxi . . . dXn , 

00 t f ' '■ 

... I Wlo" -0(E)]p(xi~0", ...,x„-9")dxi---dxn. 

00 


Making in the second integral the transformation 

yi = Xi - (0” - O'), ,y„ = x„ - (0" - 9'), 


we get 


A-fOO 

r(e") = 

f +00 

ao 


r’°W{0"-6lyi-\-(9"-9'), ... ,2/n 

*^oo 

+ (0" - 0')]} p(yi -9', ■■■ ,yn - o') dyi-- - dy^ 
■ • f W{9' -e(yi, ,yn)]p{yi~ 9',-’- ,yn-9') dyi--- dyn- 

J—oo 


Hence r(e') = r(e") and our statement that r(6) is constant is proved. In 
order to prove Theorem 5, we have only to show that 9(E) is an admissible 
estimate. For this purpose let us consider an arbitrary estimate 9*(E) and 
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denote the corresponding risk function by r*(e). Since OiE) minimizes the 
integral (28), we have 

(29) imB), E] > iim, E)] 
for all sample points E. Let us consider the integral 

(30) / = r”--. r” {W[e - m] - W[e ~ e*{E)\p{E 1 e) dedxx ■ • • dx,. 

•^00 »A— aa 

Integrating (30) with respect to B we get 

... imE), E] - n6*{E) - ^]) dxx... dxn. 

•QO j— 00 

Integrating (30) with respect to E, we get 
(32) 

On account of (29) and (31) we have / < 0, hence 


(33) 


/ [r(tf) - r*(6i)] < 0 . 

•'—00 


From (33) it follows that if r*(B) < r(B) for every B then r*(&) < r(d) can hold 
only for the points of a set of measure zero. In case r*(B) is continuous, this 
means that r*(0) ss r(B). Hence if r*(d) is continuous, then either r*(6) s r(ff) 
or there exists at least one point 6' such that r*(6') > r(9). The risk function 
r*(6) is continuous if the estimate 6*(E) is uniformly continuous in the whole 
sample space. In fact, we have 

- -f 00 - +« 

r*(B + ()= •■•/ W[B + i-B*(E)}p(xi~B — t,‘--,x„ — B~t)dXi---dx„. 

Making the transformation 


2/* “ f 


(t = 1, 


,n) 


we get 
r*(9 + 0 


p+oo 

I .../ JV[ff + i~9*(yi-i-i, ■■■ p(yi~B, • ,U„-B)dyt ■■ ■ dp„. 

J— 00 *^gO 

Since W(u) and 0*(E) are uniformly continuous, from the latter equation we 
get easily 

lim r*(6 + t) = r*{6) 

1-0 

that is to say r*($) is continuous. Considering only continuous estimates the 
admissibility of BiE), and therefore also Theorem 5, is proved. If B*iE) is not 
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uniformly continuous wc have only proved that if r*{e) < r{e) for every 8, 
then r*(d) < can hold only in a set of measure zero. I should like to 
mention without proof that even if 6*{E) is not continuous, r*{d) < r(e) implies 


r*i8) ^ r{8). 

An estimate ^(E) is called a maximum likelihood estimate if for any fixed E 
p(E 1 0) becomes a maximum with respect to 8 for 8 = S(E). 

Theorem 6. Consider the following conditions-. 

VI. There exists exactly one maximum likelihood estimate §{E) with the fol- 
lovying properties: ^ 

a) For any E p(E \ 8) ts non-decreasing with increasing 9 for 9 < §{E) and 
non-increasing with increasing 8 for 8 > ^E). 

b) For any E p(E | 5) is a symmetric function of 9 about 8(E) that is to say, for 
{or any real value X p[E \ 9(E) — X] = p[E \ ^(E) + X]. 

VII. The value of the weight function depends only on the absolute val-We of the 

difference u = 8 — 8 and exists, is uniformly continuous and > 0 for 


M > 0. 

If the conditions I-V of Theorem 5 and the above condition VII are fulfilled, 
and if ^E) is a maximum likelihood estimate satisfying VI, then d(E) is a best 
estimate 

Assume that the conditions I-V and VII are satisfied and that ^(E) is a 
maximum likelihood estimate satisfying VI. It is obvious that 9{E") — h(E') = 
X for jB' = (xi , ■ ■ ■ , Xn) and E” — (aJi + , ain + X). In order to prove 

Theorem 6, we have, according to Theorem 5, only to .show that the integral 
in (28) 


H0, = [ 


+00 

00 


w(e - 9)p(E 1 9) dJB 


- ^ - dJV (w) 

becomes a minimum for 9 = 9(E). Denote 9 — 0 by u. Since - ■ is uni- 

formly continuous, we have 


a^(e, E) 
d~9 



p(E 1 9) d9. 


dW{u) dW{-u) , 

Since — T have 

du du 

(34) = f le-u) -p(Ei9 + u)] du. 

From condition VI it follows easily that for any fixed E and 6 the function of u 
(0 < U < 00 ) 


p(E 1 0 - u) — p(E 1 fl + u) 
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does not change its sign and iidy^ h{E) there exists an interval J such that the 
above expression is unequal to zero for every point u of /. Hence on account 

of > 0 for M > 0, the integral in (34) vanishes only for 6 = ff(E). Since 

according to the condition V there exists a finite value 6' such that E) 
becomes a minimum for 6 = O', 6' must be equal to h{B). This proves 
Theorem 6. 

The condition VI is seldom exactly fulfilled. But for large n, in the great 
majority of practical cases, VI will be fulfilled with good approximation and the 
best estimate approaches the maximum likelihood estimate with increasing n, 


10. Two examples. As a first example we consider a normal distribution 
with the variance 1. The mean value d is unknown and we have to estimate 
it by means of a sample E = (xi, ■ , x„). In this case 


j)iE\e) = 


1 


(2t)5 


It is obvious that for a very broad class of weight functions the conditions I~V 
of Theorem 5 are fulfilled. The maximum likelihood estimate ^{xi , ■ ■ ,Xn) = 

5Ci ^ • « I -| - ^ 

- satisfies the condition VI of Theorem 6. Hence if the weight 


n 


function satisfies also the condition VII, then the best estimate of 0 is the maxi- 
mum likelihood estimate , • • • , a;„) — . 

n 

Let us now consider a weight function defined as follows: 

W(e, fl) = 2(0-0) if 0 > 0 
and 

W(e, 0) = 0 - 0 if 0 < e. 

Since for this weight function, the conditions I-V satisfied, according to Theo- 
rem 5 the best estimate of 0 is the value 0 for which the integral 


r 


W(0,6)e^^^^'-‘^’ d0 = f 2(0 -0)e 

»L.oQ 






6)e 




d0 


becomes a minimum. As an easy calculation shows, the estimate obtained in 
this way is not the arithmetic mean. 

As a second example we consider the family of variates ^(0) with the proba- 
bility density /(a:, 0) defined as follows: 

/(x, 0) = 1 if 0-i<x<0 + i 
and 


f(x, 0) = 0 for all other values of x. 
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If j? = (a;i , • • • , a^n) denotes a sample point where Xi denotes the smallest and 
a!„ denotes the greatest value in the sample, then 

n 

p(E I ^) = n fix,, 6) - 1 if Xn - i < (? < a:i + I 

and 

p(E I I?) = 0 for all other values of 6. 


The classical method of maximum likelihood cannot be applied here, since 
p(E 1 6) is maximum for every value e for which ^ < 0 < a;i It is 
obvious that for a broad class of weight functions the conditions I-V are satis- 

fied. The estimate e(E) = — - — where aii denotes the smallest and the 


greatest value in the sample, satisfies the condition VI. Hence if the weight 
function satisfies also' the condition VII, the best estimate of 6 is given by 

8» - 


Let us now calculate the best estimate of 0 if the weight function is given as 
follows: 


Wie, 0) « 0 - 0 if 0 < 0 


and 

Wie, 0) = 2(0 - 0) if 0 > 0. 


In this case the conditions I-V are satisfied but not the condition VII. We 
have to calculate the integral ^(0, E) given in (28), which reduces in this case to 


pi 

1 ^( 0 , E) = / Wie, e)djB = / 2(0 -e)de+ (0-0) 

•'xn-J 4„-t J) 


dP 


= 1.50* — [(xi + i) + 2(xn — 3)10 + "d" iY + ixn' ~ i)*' 


This expression becomes a minimum for 


_xi +2z„- i 


3 


Hence the best estimate of 0 is given by this expression. 


U. Miscellaneous remarks. Assumptions 1-6 of paragraph 8 are sufiicient 
but not necessary for the proof of the Theorems 1-3 (Theorems 4-6 have been 
deduced without Assumptions 1-6). They can be weakened in many respects. 
The assumption that the parameter space is bounded can be dropped if we 
impose certain conditions on the weight function W(0, 0) and the probability 
density p(E\ 0). It is certainly not necessary to assume that Wi$, 0) and 
f{E 1 0) are everywhere continuous. It is however doubtful whether Theorems 
1-3 remain valid in the form in which they are stated, if we admit discon- 
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tiiluities in a set of measure zero without imposing any other restrictions. Also 
Assumptions 4-6 can in all probability be essentially weakened. 

I should like to mention that Assumption 4 is not as restrictive as it would 
appear- Let us make this clear in the case that the parameter space is a one- 
dimensional interval [a, bj. If we assume that Wid, 9) is a polynomial of the 
second degree in $ and the coefficient of 9^ is positive for every 9, and if 
p(JS 1 ^) > 0 for every E and 8, the Assumption 4 can easily be proved. In fact, 


HB, E) 



W{B, 8)piE 1 9) dm ^A{E)+ B{E)9 + C{Eye\ 


Since the coefficient of 8'^ in 8) is positive and since p{E | fi) > 0 for every 
E and 8, C{E) >0 for every B and for any arbitrary distribution /(0). Prom 
this fact follows easily that for every E there exists a value 6{E) in the interval 
[a, b] such that 


^y8{E), E\ < ^(5, E) 

for every I contained in [a, b] and unequal to HE). Hence Assumption 4 is 
proved. 

Let us consider a system S of subsets of the parameter space 0 and tlie 
corresponding system Ha of hypotheses. The weight function W(d, w) is defined 
for all points 0 of 12 and for all elements u of S and expresses the weight of the 
error committed by accepting H^ when 8 is true. If 9 is an element of w then 
ir(9, w) is of course equal to zero. Let us assume that W(d, w) has the special 
form; W{9, w) = 1 if ^ is not contained in w, and W($, w) = 0 if e is an element 
of w. It is obvious that in this case for any 6 the value of the risk function 
r(9) is equal to the probability of accepting a false hypothesis if 6 is the true 
parameter point. Because of this fact the theory developed here has close rela- 
tion to the theory of confidence intervals. Let us first make this clear for the 
case when the parameter space is one dimensional, that is to say 6 is a real 
number. 

In the theory of confidence intervals we estimate the unknown parameter 8 
by an interval I{E) extending from 0i{E) to 9i{E) where 8iiE) and 6j(S) are 
certain functions of the sample point E. The interval 7 [E) is defined in such a 
way that the following probability statement holds; If we perform an experi- 
ment, the probability that we shall obtain a sample point E such that 1{E) will 
cover the true parameter point 8, is equal to a given constant a (called confi- 
dence coefficient) and is independent of the value of 8. Let us consider a 
certain example of such an inference with the confidence coefficient a and 
denote by 1{E) the interval corresponding to E. We define a system B of 
intervals as follows; An interval 7 is an element of S if and only if there exists 
a sample point E for which J(E) = 7. Consider the corresponding system Hs 
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of hypotheses and the weight function W(0, 7) defined for all values 6 and all 
elements 7 of 5 as follows. 

W (9, 7) = 0 if 0 is a point of 7 

W {9, 7) ~ 1 if 6 IS not contained in 7, 

Denote by Ms a best system of regions of acceptance relative to the weight 
function defined above. Denote by 7'(£) the element of S which we accept 
according to Ms if 1® is the sample point. On account of the special form of the 
weight function, the risk is obviously equal to the probability of accepting a 
false interval. From the definition of the best system of regions it follows 
that for any 9 the probability that r{E) will cover 9 is greater than or equal to a. 
If the risk function is constant, that is to say, if the probability that 7'(F) will 
cover the true parameter point 9 is independent of the value of 6, then the 
intervals 7 '(j 5) are confidence intervals corresponding to a confidence coeffi- 
cient a' > a. 

Similar observations can be made if the parameter space is ^:-dimensional 
{k > 1) that is to say, 0 is a system of k numbers • • • , 0“’ An important 
case is that when we have to estimate only one of the components, say by 
an interval. As the investigations of W. Feller® have shown, confidence inter- 
vals in such cases do not exist always. That is to say, it is not always possible 
to determine I{E) such that the probability that I{E) will cover fi'*’ is equal 
to a given constant a independently of the values of • • • , It is of 
great interest to know under what conditions confidence intervals exist. I 
should like to mention that a further development of the theory given in para- 
graph 8 may contribute much to the solution of this problem. In order to make 
this clear, let us consider a system Si of one-dimensional intervals. To each 
element 7 of Si let there correspond the subset w of the fc-dimensional parameter 
space fl consisting of all points 9 = {9^^\ • • • , 0^**) for which 0^*^ lies in 7. Con- 
sider the system S of subsets w of fl corresponding to all elements of Si and the 
system Hs oi hypotheses corresponding to S. The weight function is to be 
chosen as follows: W{9, w) = 1 if 0 is not an element of w and TF(0, w) = 0 if 0 
is an element of w. Consider a best system Mg of regions of acceptance and 
the corresponding risk function r(0). On account of the special definition of 
W{d, w), r(0) is equal to the probability of accepting a false hypothesis if 0 is the 
true parameter point. If the risk function r(0) is identically equal to a con- 
stant a, wo have confidence intervals corresponding to the confidence coefficient 
a. In order to see under what conditions the risk function is constant, we have 
to consider an equivalent problem (see paragraph 7) where the system of hy- 
potheses is the system of all simple hypotheses and the weight function W{9, 0) 

®W. Feller, “Noteon Regions Similar to theSampleSpace,”iSia<MhcoJ HesearchMemoirs, 
Vol. II, 1938. 
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is given according to formula (4). If W{i 5) satisfies Assumptions 1-6, 
risk function of the best estimate is constant. As we have mentioned, Assu% 
tions 1-6 can be weakened. In order to get valuable results concerning tlic 
problem of the existence of confidence intervals, we have to weaken especially 
Assumption 2. In fact W% w) takes only the values 1 and 0 and therefore 
If (0, S) cannot be continuous. 

Finally I should like to mention that the most stringent test as defined by 
Robert W. B, Jackson" is contained as special case in our general definition of 
the best system of regions of acceptance. Jackson considers a discontinuous 
parameter space ft. Consider the problem of testing the hypothesis 9 == 
where denotes a point of ft. According to Jackson’s definition we have the 
most stringent test if the critical region Wt satisfies the condition: the maximum 
of the numbers A and B 

A =s P(F1 « «) 1 %), B = least upper bound of P(F1 < ® 1 6) formed for all I ^ So, 

becomes a minimum for u) mo-m denotes the region complementary to u), 
It is easy to see that Jackson’s definition of the most stringent test coincides with 
our definition of the best system of regions of acceptance in the following 
special case: 

1) ft is discontinuous 2) S consists only of two elements. 

3) The weight function If (6, w) is equal to 1 if 0 is not contained in w. 

Columbia Univbbsity. 


® Robert W Jackson, “Testing Statistical Hypotheses," Slaluiicat Hesearck Mmoirs, 
Vol 1, 1936. 



THE DISTRIBUTION OF THE MULTIPLE CORRELATION 
COEFFICIENT IN PERIODOGRAM ANALYSIS 

By D. M. STAitKEY 


1, Geometrical interpretation of the problem. We begin with a summary 
of some recent work by Hotelling, in a form relevant to this particular problem/ 
He suggests that the general question of finding the distribution of the multiple 
correlation coefficient corresponding to a fitted regression of y upon x may be 
solved by evaluating definite integrals corresponding to invariants of certain 
curves, surfaces, etc. For the purposes of illustration we may consider the case 
of fitting the relation 

y = a + bf(x, k, f) 

where/ is an arbitrary function, and a, b, k, t are constants, to the observations 
y, where we are given n values of y, j/i , yu , • • • , y„ and the corresponding values of 
, Xn . We shall postulate that the y’s are independent and normally 
distributed about a certain mean and that the regression may be fitted by 
means of the principle of least squares. 

We must minimize the sum of squares 

Z iVa - YaY * E [y« - a - hfix„, k, e)]® 

0!««1 

and hence we differentiate with respect to a, obtaining the first condition for a 
minimum 

E [y<» ~ ~ bfixa, k, €>] = 0. 

0-1 


In the following, all summations take place over a range a = 1 to n. Then we 
have 

a - y - bj 


where 


Sj/g f _ 2/(a:a, k, e) 

IT' ■' n 


Thus we minimize the sum of squares 


S[(y« - y) - Hf{xa ,*,«)- /)]■ 


•■Harold Hotelling, “Tubes and spheres in n-spaces, and a class of statistical problem", 
American Journal of Mathcmahce, April, 1939 
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or, putting y'a = Va - ‘S 

F: = y, - Y - hf{x. ,k,e)-! 

we see that the quantity 2(yJ, — F«)* is to be minimized. 

Geometrically we may regard the set of values (yi , • ■ , y„) as defining a 
point in w-space, and (Fi , • • • ,Y„) will also represent a point in n-space on the 
4-dimensional surface which may be obtained by eliminating a, b, h, e from the 
relations F = a -I- hfix, k, e). The points {yi , • ,y'n) and (7{, . . , yj,) 

represent the orthogonal projections of (yi , ■ • • , yn) and (Fi , ... ^ 
on the plane S = 0. Hence we have to minimize the distance between these 
projections, noticing that (Fi , • • , Y„) now lies on. the 3-dimensional projection 
of the .surface on which (Fi , • ■ , y„) lies. The multiple correlation between 
the observed and fitted values is defined as 

p V.f: 

Vs(2/<. - - F)“ 

and this is equal to cos fi, where fi is the angle between the lines joining the origin 
to the points (yi , • • , yli) and (FI , • • , Fj,). For the purpose of evaluating R 
we may thus consider the projections of these points on the unit sphere in 
S y« = 0 with centre the origin, these being 

/ y'l y'n \ ^ ( Y[ f; \ 

' ViS; 

As by hypothesis the distribution of y has spherical symmetry about some 

point on the line yi = yj = yn , then the distribution of y' has spherical 

symmetry about the origin, and the probability distribution of the projection 
of y' on the unit sphere is uniform. The projection of Y' lies on a 2-dimensional 
surface on the (n - 2)-dimensional sphere, and for a given Y' the probability 
that R is as great or greater than cos fi is proportional to the volume of the 
.sphere in the (n — 2)-dimenslonal spherical space with centre F' and geodesic 
radius fi, so that the total probability that R lies between cos fi and 1 is equal 
to the ratio of the “area” of the portion of the unit sphere included by the 
envelope of these geodesic spheres to the “area” of the unit sphere. This 
envelope is that part of the unit sphere in S y^ = 0 which is at a geodesic distance 
fi from the 2-dimensional surface on which the projection of F' lies, termed a 
“tube” by Hotelling. 

For very small values of fi it may be assumed that this ratio is equal to the 
area of the two-dimensional surface on which F' lies, multiplied by a fixed 
multiple of This is fairly evident intuitively, but has recently been 

svvbstantiated by some results of Weyl* who shows that this is correct for small 
values of fi, and indicates a series from which could be derived a series of a,scend- 

’H. Wey], "On the volume of tubes," American Journal of Mathemaitcs, April, 1939 
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ing powers of by which successive adjustments could be made for larger values 
of 6. The coefficients in this series are finite invariants of the surface in which 
we are working. If we accept the first approximation we must consider the 
question of the extent of the surface, which depends on the range of values of 
the parameters k, e. The range which is eventually chosen depends on the needs 
of the practical statistician, while keeping in view the mathematical possibilities 
of effecting a solution. In the following work we consider in particular the case 
of periodogram analysis by putting/(a;, k, «) = cos (kx + e). 


2, The case of periodogram analysis. With the notation of the preceding 
paragraph, we fit 

Yc = a + b cos (kxa -f <) 
to data (Xa 1 y a) rx 1, 2, • • • , 71. 

We shall assume that the variate a: is a measurement of time or some other 
quantity for which the measurements are made at equal intervals, taken as 
unity for convenience, so that 

Xi = 0, Xi — I, • ■ • , Xn — n — 1. 

Now we shall see later that we are interested in values of k such that 0 <k <2r. 
For this range 

y _ 2 cos (kxg + t) 
n 


_ sin {\nk) cos [e + ik(n — 1)] 

71. sin (^fc) 

Hence, if V" represents the projection of V' on the unit sphere 

y': - xfoos (i*. + .) - (!"*) C0» [, + a(n - 1)0 

L n sm (ik) J 


where \ is to be determined so that 

Now 


zr''‘ = X' 2 cos* (kxa + «) 


27 ? = 1 . 

sin* (ink) cos* [e + ik(n — 1)]’ 


and 


2 cos* (kx^ + «) = + 


n sin* (ik) 

1 sin nk cos [26 + k(n — 1)] 

2 sin k 


and hence 


\ = 


1 


s/\ 


1 sin nk cos {2t + k(n — 1)] _ sin* (jnk) cos* [e + jkjn — 1)] 

2 ™ 2 sin n sin* (ik) 
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the expression being continous at fc = tt. 

Then Y” - A (cos (Aa:« + «)—/) 

= X cos {kxa + «) + ^ say. 

Regarding fc and e as curvilinear coordinates of a point on the surface, we apply 
the formula 


■s/eg - F^dkdf 


for the element of surface area, where 


-(f)’ 


F = S 


byZ 

Bk Be ’ 


\ de / 


In evaluating these summations, we shall need the following results; ST" = 0, 
SFa* = 1, from which we obtain 

(1) 2 cos QcZa + <) = 

A 


2 cos’' {kXa + «) = 


1 + 

' 


Differentiating these relations, we have 
(3) Sitt sin (kXa + t) 


2 sin {kxa + «) 


dk\\J 
X X" 

= 

0« Vx/ 

X 


Bin- + ,) . Izri + J|,(l+,!i’) 


- + 1 [(_?^‘ + 1*) (»!■ + 1) 
SB, COB ihXa + e) sin (fee, + t) .= 


X*(l + nt) 

X« ■ X* 
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(7) 


S cos {kxa + t) sin {kxa + «) 


19/ 1 + nf \ 
2 Be \ / 




nfS. 


( 8 ) 


Now 


and 




- 5+-^ - s ^ (1 +«?■)+ i !+ ( 1 + 


dk 


2 X« ' ‘ ' ’2 X* 

, n^k^i I n^^fk 

2X* 2X=* X» 


= Xi cos (fca:„ + «) — Xx^ sin {kXa + «)+{* 


X” 


dK 

Bf 


= X, cos (/cx« + e) — X sin {kxu + «)+?, 


so that with the above definitions of E, F, G we obtain 

B = XfcS cos*(fcx„ + «) + X^S sin*(fcx„ + e) + - 2XX*Sx„ cos {kXa + *) 

■ sin {kXa + e) — 2X|fcSia sin {kx„ + e) + 2\kkk^ cos (kxa + «) 




A;j^ 


F = X.X^D cos° (fcXa + e) + X^SXc, sin“ (kx„ + t) + n^,^k 

-XXjS sin {kXa + e) cos {kXa + «) + 5<X»;S cos {kXa + e) — X$,2x„ sin(fcxa + «) 

+ XtJiS cos(fcXa + «) — XX.SXa sin (fcx„ + e) cos {kxa + e) — Xb2 sin {kxa + «) 

^ / 1 ^ Xi, , xV(n - 1) ^2 nX 7 jfc/« 

-X.X.(^^,j-^ + 4 +^X 


G = X®2 cos^ (fcxt, + e) + X^2 sin^ {kxa + e) + nf, — 2X|,2 sin {kxa + «) 
- 2XX,2 cos + e) sin {kXa + e) + 2X.^.2 cos (kx^ + e) 

= + nX“ - (nf X” + 1) - nX7l 

after using the relation ^ = — /X to eliminate 
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These relations give 

EQ~P^ = ("^ + ~ I + 1 

(9) X + nX“ - inn^ + 1) - nX^) 

(X.Xk _ x*< x’ri(n “ D , _ nXVJ.V 

V 2X2 2X 4 "^2 2 / • 

The area of the surface on which Y" lies is 

VEO -Pdkdt 

over an appropriate range of values of k and e, but it appears that this integral 
cannot be evaluated exactly. We shall obtain an approximation for large 
values of n, by obtaining approximations to X, /, and their derivatives, when n 
is large. 

The range of periods, will be considered to vary from quantities greater 

than one up to half the range, that is |(n — 1). This is chosen on the grounds 
that the intervals of time would be adjusted so that there would be no expecta- 
tion of periods less than the interval, and that enough observations would be 
chosen to include at least two periods in the range. Although this supposes 
some a priori knowledge of the possible periods, it seems reasonable to expect 
that the experimenter would have at least a rough idea of the range of periods 
which might fit his data before attempting to fit a harmonic curve. This range 
gives a range of values of k from 47r/(n — 1) to 2x(l — v) where v is arbitrarily 
small, but fixed. In all cases the epoch, «, varies from 0 to 2Tr. 

It is readily seen that the surface is traced out only once for this range of 
values of k, «, so that the problem in its approximate form is reduced to that of 
the evaluation of the definite integral 

J p2T 

/ VEG-F^dkde. 

0 

n-1 

We shall obtain the approximations mentioned above, in the first place excluding 
from consideration values of k in the neighbourhood of 0, ir, 2ir, the integrals 
over small ranges including these values being obtained separately. 

If k is not in the neighbourhood of 0, ir, 2ir, we note that 

sin i^nk) cos [t -j- p(w — 1)] 
sin (P) 

is a bounded function of k, the upper bound being independent of k, and at most 
equal to ( cosec i fci | , where fci is the angle in the range considered nearest to 0, 
IT, 2x. Similarly the upper bound of 

sin (nk) cos [2e + k{n ~ 1)] 
sin k 
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is at most 1 cosec /Ci j . Hence as n is increased, wc may expand X/n' in ascend- 
ing powers of ?r\ For large n, therefore, X = 0(n"*), and is approximately 
{2/n)'. Since differentiation with respect to k introduces a multiplying factor ?i 
in some of the terms, it follows that this is compensated for by the factor X“''‘ 
which occurs in the denominator of the derivative, and we may conclude that 
No such compensating factor n occurs in the numerator of X. , 
and it is therefore of order (n~^). It may readily be seen without actually 
evaluating the derivatives, which are very long and unwieldy expressions, that 

= O(n^), X .1 = = 0(n-^),h = 0(1), 

hk = 0(n),A, = 0(1),/= 0(0- 

We thus see that the term of highest order in = l)^(2n — ^ ^2 

The term of highest order in G = nX* — 1. 

The term of highest order in F = X“. 

These are approximately constant for large n, and are equal to n^/d, 1, n/2 to a 
first order of approximation. Hence 


/eg - 


Vl2’ 


The range for k may be broken up as follows: 

(a) from - to , where a is a finite angle, independent of n, 

n — 1 n* 

(b) from to TT — 

n‘ n' 


(c) from IT — -7 to IT H — r 

n* n* 

(d) from TT + ~ to 2 -k — 

n* n* 

(e) from 2x — to 2'ir(l — v). 

The method of procedure will be to show that in ranges (a), (c), (e) the integrand 
is of order n, and that since the ranges in all three cases are of order n \ the 
values of the integrals in those ranges are 0(n*) which is negligible in comparison 
with the contributions from (b) and (d), which are 0{n). 

In (a), < A: < ^, we put fc = a = 4ir, and let 5 range from p to 

where p is a positive quantity defined by the relation {n — 1) = n Then 
X, /, are of orders and respectively. For this range of values of S, the 
orders of the derivatives are • 


\/c 

XkK 

X. 

Xt, 


i-s 

n 
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These being decreased, i t follows that the order of is not increased 

for any positive d, and -x / BG — F'^ — 0(n) as before. 

In (c), IT - ^ < & < TT + we put A: = IT d=^j , according and 

consider 0 < 5 < The orders of the derivatives are as stated in (a) above for 

this range. The remainder of the range t — - <A:<7r + -is such that the 

71 71 

values o f the deri vatives are of orders as stated with 5 = 0, while X = 0(n.~*). 
Thus V EG — F'^ = 0(n) throughout. 

In (e), 2t ~ < k < 27r(l — »), we put k == 2ir — , and consider 0 < 

w* n ~ 

5 < |. In this range the orders of the derivatives are as in (a). In the remainder 
of the range, 2% ~ ~ < k < 2ir(l — v), the orders of the derivatives are as in 

7% 

(a) with 5 = 0, so that -s/EG — F^ = 0(n). 

As the ranges (b) and (d) are not independent of n, it remains to be shown 

that this fact does not affect the final result. We consider, therefore, k = ~ 


and k = r 




where ^ < 5 < 1, and since, as in (0) the second and third 


terms in the denominator of X are 0(n* ') and or 0(n~^) respectively, 

X ~ 1 , while the derivatives have values as in case (a). Thus, in these 

throughout. Thus we may conclude in all cases 


ranges, s/EG - F^ 


Vl2 


that ■\/EG — F* = 0(n). 

The surface area = f + f ^ + f 

Jo Jir Ja J oc 


\/n \/n 

.tr-~ .Sra-v) 


~\r f ^ "t” /* 's/eg ~ F^dk 

J , a a 

=r[f 


dt 


L-.y/n 


Vn 


-ar .-yz. r''+-y= .Si-d-ti) 

+/ r+/r+/ 

-"-1 ' Vn 


F^dk 


dt. 


In the first two ranges, s/EG — 

Vi2 

In the last three ranges, ■y/EG — F^ = 0{n) and therefore the integral = O(n’). 
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Thus the area is equal to 

7h ( 47r^ 2 ^ 

(10) 27r ^271 “ + terms of lower order = V3 w 

In the case of fitting a linear regression with 3 independent variates, the 
distribution of R is well known to be 

r(|)r[Kn - 4)] 

It may readily be seen by a repetition of the argument used in the first paragraph 
that this expression could be derived by considering the volume of a tbbe in 
spherical space of (n — 2) dimensions, in which the base surface is a 2-dimen- 
sional unit sphere of area 4'7r. We are assuming that the first approximation 
to the volume of a tube is equal to the area of the surface multiplied by a fixed 
function of 9. If, therefore, we divide this expression by 4ir, and take R suffi- 
ciently close to 1, or = cos'^iZ sufficiently close to zero, we shall obtain the 
expression by which to multiply the surface area, (10), in order to obtain the 
first approximation to the frequency function of R. 

Using Stirling’s approximation, we have 

r[Kn - 1)] ~ 

and r[i(n - 4)] ~ - 4)]^‘’'-'>-^. 

/ 3 XKn-I) 

The ratio of these = ( 1 -1- ^ - 4)* ~ 2 

Hence the multiplying constant is approximately nV\/2x. Substituting 
E = cos 0 in the frequency function divided by this constant, we obtain 
2 cos“ 0 sm’'‘'®0 sin 9 d6 giving 20"”® d9 as the first approximation. 

Hence the approximate frequency function for the quantity 9 in the case of 
periodogram analysis is 

2x* -y/ 3n 

J^29^~^d9 — ^ = 2”*n‘T‘3”‘0"”‘d0, 

V2x 4 t 

Thus the first approximation to the probability that 9 should be as great or 
greater is 

nn—i 

n — 4 


or 


w 



nn-t 


approximately 
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The approximations which have been introduced have been forced upon us 
by the limitations of the mathematical machinery involved. It must be ad- 
mitted that those approximations are not those which the experimenter would 
choose, for the following obvioms rea.son. If we are testing the null hypothesis 
that the population correlation is zero, for large valuc.s of n the sample correla- 
tion will approach its expectation value, namely zero, and we shall in general be 
interested in values of R which are small, and corresponding values of 6 in the 
neighborhood of •)r/2. This situation is not provided for in this investigation, 
It may be, however, that there exists a large correlation in the population, 
and that owing to the large number in the sample the value of R calculated is 
near this value, Provided that this population correlation is sufficiently close 
to unity, the value of S will be small enough to apply the distribution obtained 
above, and in such a case will enable us to reject the null hypothesis when the 
probability calculated from the distribution is sufficiently small. 

XlNiViSKBiTY OP London. 



ON THE APPLICATION OF THE Z-TEST TO RANDOMIZED BLOCKS 

By M. D. McCarthy 

1. Introduction. When a series of experiments is performed with the object 
of measuring some quantity, it is implicitly postulated that the quantity in 
question has a “true value, which is theoretically obtainable as the result of an 
infinite repetition of the experiment under the standard conditions. In certain 
experiments, especially those of physical and chemical science, the materials and 
the methods employed are subject to such accurate control by the experimenter 
that he can repeat his experiment again and again with the “essential” factors 
kept constant, and with biassed errors eliminated. This repetition gives a series 
of observations of the “true value” in question subject only to random errors. 
All that is needed, usually, to increase the accuracy of the estimate of the “true 
value” is to continue the repetition of the experiment. Not only does such a 
repetition make the estimate more exact but it also provides an estimate of the 
degree of accuracy present, permits a comparison between different quantities 
and makes it possible to test various hypotheses as to their relative values. 

In many cases which arise, notably in biological and social science and in 
dealing with data provided by modern mass-production methods, it is a practical 
impossibility to repeat an experiment under the same essential conditions. The 
material available is definitely non-homogeneous with regard to at least some 
of the qualities influencing the results. In testing, for instance, a number of 
varieties of some plant, to find which gives the best yield, it is possible to 
guarantee, that to a high degree of accuracy all the varieties are cultivated alike. 
If a relatively small area is covered by the experimental plots, it can be said 
that all the varieties experience the same climatic conditions and it is not diffi- 
cult to ensure that they are all treated alike as to measurement of produce and 
so on. It is, however, practically impossible to make the plots, on which the 
varieties are grown, homogeneous as regards fertility of the soil and, even if 
this were possible, it would partially defeat the purpose of the experiment which 
is to test the varieties over a certain limited range of soil types. In a similar 
way in many other fields of biological or social experiments a similar non- 
homogeneity of the experimental material exists. 

In experimenting with homogeneous materials, where the conditions of the 
whole series of experiments are the same, the differences which occur between 
the theoretical “true value” and the observations are explained as being due 
to a multiplicity of causes outside the control of the experimenter and of such a 
nature that their incidence varies “randomly” from experiment to experiment. 
It is a fact that certain fundamental factors influencing the results are defi- 
nitely non-random in their incidence which differentiates the experiments with 
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non-homogeneous material from the others and it is by arlifically introducing 
randomization, as suggested by Fisher [1, 2, 3] that such experiments are made 
amenable to the usual error laws. 

For convenience, in what follows, the word “variety’' will be used when 
speaking of a single object of those under test, whether it actually be a variety 
of some plant, a manurial treatment, a method of feeding or anything else of 
the sort. For instance, if five varieties and three manurial treatments are 
being tested in the same experiment, a “variety” would be any one of the fifteen 
combinations of an actual variety under test with a manure. The word “plot” 
will be used for that portion of the non-homogeneous material which is required 
for the performance of an experiment on a single “variety,” and the term “yield” 
will be applied to the value of the observed quantity obtained as the result of 
testing a “variety” on a single “plot.” The plots are, of course, equalized with 
respect to “size,” or whatever similar property would influence the test. 

2. Randomized Blocks. Suppose that there are s varieties to be tested and 
that the necessary replication is attained by testing each variety on n separate 
plots. That the plots on which each variety is tested form a random sample 
of the material available is guaranteed by assigning each of the s varieties to n 
of the available ns plots at random, that is, as the result of a physical random 
experiment with cards, dice, or the like. This method of randomization may 
be so employed that no restrictions are put on the plots to which the varieties 
are assigned, or it may be further refined in different ways so that, while pre- 
serving the random nature of the assignment, certain restrictions may be placed 
on it. Such a method of randomization with restrictions is the method known 
as “randomized blocks.” 

The basic idea is that compact “blocks” of the non-homogeneous material are, 
probably, much more uniform than the material taken as a whole. Conse- 
quently, the material is first divided into n such “blocks,” as compact and 
uniform as possible, each block containing s equal plots. Each of the s varieties 
under test is assigned to a single plot in every block and randomness is attained 
by making the assignment of the varieties to the plots in each block as the 
result of a separate random experiment Thus the n plots to which each variety 
is assigned do actually form a random sample of the non-homogeneous material 
with the restriction that to each plot of any variety corresponds a plot of any 
other variety from the same block. 

3. Mathematical Formulation. denotes the “true yield" of the Wb. 

variety which would be obtained by testing it on the fth plot in the jth block. 
k = I, 2, ■■■, s denotes the number by which the variety is known, I = 1, 
2, • • ■ , s the order-number of the plot in the block and j ~ 1, 2, • • • , a the 
number of the block. Following Neyman [4, p. 110] we define the “true yield,” 
again with particular reference to agricultural experiments, as‘ 

“Suppose that the experiment is repeated indefinitely without any change of 
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vegetative conditions or of arrangement so that the kth variety is always tested 
on the plot (j, 1). The yields from this plot will form a population, say n,,ct) , 
and X,nk) is defined as the mean of this population.’' 

Thus, in any block, there are s* different possible populations with corre- 
sponding “true values,” but in any single experiment on that block observa- 
tions will be obtained from only s of the s'* possible populations. To distinguish 
those populations for ivhich an observation is available from those which are 
entirely hypothetical will denote the “true yield,” as already defined, of 
the kth variety on the plot to which it has been assigned in the jth block. Since 
this assignment has been carried out as the result of a random experiment the 
“true yield” is itself a random variable; is randomly selected from the set of s 
possible values X,nk) , X,nt!) > • ■ • X,,oo ■ 

Using the dot notation to denote the mean of a quantity taken over all values 
of the letter replaced by the dot, it is clear that 

X^Uk) = X -H [Xj (i) — X .(i)] -f- [XjioD — X; (fc)] 

= X .^k•) + B,k + u,nk) , 
and 

X-tik) = X..(k) + [X, (k) — X (fc)] + [X,(ir) — X,.(fc)] 

= X..(jt) -1- B,k -j- i),fc , 

where 

Bjk = Xj (fc) — X ,(!;) , = X,j(jt) — X,.(o 

and 

•n,k = X,^k) — X,.(i) . 

Obviously 

n 9 

23 jB, 4 = 0 and 23 = 0 

,-i j-i 

from their definition.s, while is a random variable, with zero expectation, 
selected from the sequence u,nk-) , ■u, 2 (« • ■ • • Neyman (loc. cit.) calls 

ri,k , thus defined, the “soil error” of the Ath variety when tested on its assigned 
plot in the yth block. The actual yield observed when the /bth variety is tested 
on its assigned plot in the yth block is a:, a. and the difference xjk — X,(q = 
is termed the “technical error." Clearly 

(1) x,k = X..(t) -f B,k -b i),'A + • 

Botli “soil error” and “technical error" enter into any comparisons which may 
be made and it is well known that the major source of error in, for instance, 
agricultural experiments is that due to the heterogeneity of the soil, As regards 
the relative magnitudes of the two errors, that of course depends on the experi- 
ment in question, but Fisher [5] has stated that in an agricultural uniformity 
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trial (i.e. when the same variety is tested on all the plots) yields from plots of 
l/ 40 th of an acre frequently vary sufficiently among themselves, owing to soil 
heterogeneity, so as to give a standard deviation of ten per cent of the mean 
yield, while the inevitable random errors in treating the plots can be kept down 
to a much lower figure. By confining the randomization to a “block” of the 
material, which comprises only a relatively small compact portion of the whole 
material under test, the effects of soil heterogeneity may be much decreased. 
It appears, however, that it may very often be an unwarranted simplification to 
consider that the “true yield” of a variety is the same for all plots of a given 
block. 

The two types of “error” are random variables of altogether different proper- 
ties. Both have zero expectation and may be considered as independent of one 
another in the probability sense. It, therefore, appears reasonable to assume 
that €,1: is independent both of the “technical error” in any other observation 
and of the rj’s. On the other hand 17, * is a random variable selected from the 
sequence 

(2) W,1{7:) , U]2(k) f • ‘ 

and since, if t/,it has the value u,nk) and is free to assume any one of the values 
^jicwi) j , * * j except , it js cleaT that 27 , a. and ijjm are not inde- 
pendent. In the case of 27,/* and i7,-»m where j' N j", the random variables are 
drawn as the result of two separate random experiments from different sequences 
of the type ( 2 ). Obviously this means that the “soil errors” for different blocks 
are independent for either the same or different varieties. Writing E for the 
expected value, or the mean value in repeated experiments, since 

s t 

2 ~ 0; 
f-1 I-l 

the variance of ri,'k is 

( 3 ) O’*,* = wjiw) 

1-1 

and also 

8 

(4) El’tJikVim] — IsCs 1)} ■ 

i-l 

Using ( 1 ), ( 3 ) and ( 4 ) it follows that 

= X -(A) + -SjA = fljA , 

El{X]k — djhf] ~ E[(ri,k + e,*)*] = + o'*, 


( 5 ) 
say, 

( 6 ) 
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The expectations of the various product terms on the right-hand sides of these 
equations vanish except in the case of the last one. If j' ^ j" it too vanisheSj 
whatever values of k and m, and it follows that the correlation of the observed 
yields of any two varieties, or of the same variety obtained from different blocks 
is zero. It is clear, however, that such is not the case when the yields are ob- 
tained from plots on the same block. Denoting by the coefficient of 
correlation between x,k and x,„ and using (4) 


(7) 


Pi (.km) PHmk) 


2 MjlW W}((n) 




when N m, while, of course, p,(itk) = 1- 
It may be noted that even when two sequences such as in, (2) are identical 
the correlation Pi(km) .is not zero. In this case, when the varieties react in 
exactly the same way to variations in fertility within a block, , 

say, and 


(7a) Pi(km) == — (s — 1) *{1 - <r\ja\j} \ 


Then the coefficient of correlation is negative and depends only on the relative 
magnitude of the technical and soil errors for the block in question, and on the 
number of plots in the block. In a given block it is greatest in absolute magni- 
tude when the technical error is zero, or at any rate negligible with respect to 
the soil error which, of course, is usually uncontrollable. In order to have zero 
correlation between the yields of every pair of varieties it must be assumed 
either that (a) there is such a complete lack of relationship between the ways 
in which the various varieties react to the differences of fertility within a block 

t 

that for each pair of varieties k and m all the products such as ^u,nk)U,n^) 

vanish identically even though the u’s themselves are not zero, an assumption 
that lacks plausibility, or else that (b) all members of each sequence of the 
type (2) are zero. This latter assumption means that no allowance whatsoever 
is needed for variations of fertility within a block. Once variation of fertility 
within a block is admitted it appears only reasonable that it should be taken 
into account and the effect of the resulting correlations on any test concerning 
the yields of different varieties examined. 

Cram4r has shown [6, 7] that if the sum of two independent random variables 
be normally distributed each variable must itself follow the normal law. 
Strictly, therefore, it cannot be correct to apply normal theory to the random 
variables x,k in the mathematical model elaborated above, for, though e,t may 
readily be assumed normally distributed, ij,k can obviously take only a finite 
number, s, of values and, consequently, as its distribution cannot be normal, it is 
impossible that Xju can be exactly normally distributed either. However, as a 
first approximation, taking into account the correlations, it will be assumed 
that the yields from any block form a set of single observations of the variables 
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in an ^-variate normal distribution. Further, for the sake of simplicity, it will 
be assumed that the variances and covariances of the populations appropriate 
to the different blocks are the same. Dropping the distinguishing fs, the 
variances of the yields, as in (6) are defined by o-* = and ptn is written 


for Piikm) in (7). 

We define y,k and Akm by 

(8) 

Vik = Xik ~ 0,jk 

and 


(9) 

A _ Atm _ ,, 

” 2<r*o'mA 


where A is the s-rowed determinant \ Pkn\, symmetrical about its principal 
diagonal, Ajtm the cofactor of pkm in A and A is written for | Akm \ the deter- 
minant of the positive definite matrix j) Akm i|. Then since the interblock 
covariances are zero, the elementary probability law for the whole set of ns j/’s 
is given by 

(10) ‘p\Vik] == exp { - £ 53 Akmy,ky,m } . 

3 

It may be noted that^' and I where they occur run through all integral values 
from 1 to n while h and m take values from 1 to s. A sign such as 53 means 

that the summation is taken over all the pairs of values of Ic and m the term 
(m, k) being taken as distinct from the term (k, m) and including the terms in 
which k — m. 53 implies a similar summation with the omission of terms in 

which k = m. 

The distribution law (10), or similar one substituting the x’s for y’s from (8), 
takes into account also cases in which, though the correlations may be zero, 
the variances of the different variety yields differ. 


4. The Z-Test. If [xq], q = 1, 2, • • • /i , is a set of fi mutually independent 
random variables each of which follows the same normal law with zero mean 

and variance al , and if Mi = a:* , then the distribution law for Mi is, mi > 0, 




( 11 ) 


piu) = {a‘7r(i/))M*^-‘e-““‘ 


with / = fi and a ^ * 2(rJ , If also ut = ^yl , where {y,}, r = 1, 2, • • ■ ,/» , 

r*.l 

is another set of mutually independent random variables each of which is nor- 
mally distributed with zero mean and variance a \ , then the distribution law of 
Mj is (11) with = 2<r2 and f = ft. If, in addition to the independence of 
the variables within each set, there is also independence between the sets, 
then Ml is independent of m* and the distributions of different functions of «i 
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and 'Ui used as criteria may be obtained. The one originally proposed in this 
connection was z, defined by, 


2 = 5 loge (flUi/fiUi) - loge (o-s/ffi) 


and its distribution law is [8, 9, 10] 


( 12 ) 


v(z) = +f 2 )W^’ 


Any other single-valued, monotone function of ■uj/iii would when ci — at , 
as a criterion, be equivalent to F - e*' == /iMj/ZaUi , v = Usfui and w = 
Ui/iui -h ui) have been adopted as criteria and their distribution laws are 
readily deduced from (12). All these criteria are equivalent in providing con- 
trol of "errors of the first kind” (11, 12], that is, the risk of rejecting a hypothesis 
tested when true. As usual the procedure is to select arbitrarily in advance a 
certain “level of significance,” say e = 0.05, 0.01 etc., and, assuming the hy- 
pothesis tested is true, to determine the value of the criterion, say the value 
«o of 2, such that 


(13) 


P{Z > 20 



piz) dz = 


e. 


If the sample of observations gives a value of z > zo H is rejected, if 2 < 2 o , 
H is accepted. It is merely a matter of convenience which of the criteria 
2 , F, u or w is used and tables are available to facilitate numerical work. Tables 
for, 2 and F are given by Fisher [2], Fisher and Yates [13] and Snedecor [14], 
while for w Tables of the Incomplete Beta Function [15] may be used. 
Though no tables are directly available for v it is the simplest to use in theo- 
retical discussion and in subsequent sections it is its distribution law, and not 
that of 2 , which will be considered. The latter may, of course, be readily 
deduced. 

Considering the distribution law (10) with y,*, replaced by Xjk — o,jk when 
Pkm = 0 and (Tk — (Tm = <r, i.e,, all the observations are normal and independent 
with the same variance. Writing 

(14) Wi = {xjk — x.k - X,. X .)^ 

(15) U2~Yj {.x-k — x.y = n 23 (*.* - 

jtk h 

(16) Uz “ (Xj* X.*) “ 5 (Xj. ““ X.,) ) 

i,k 7 


then it is readily seen that 

Ml -h ?(2 -h M3 = 23 (xy* - a: .f. 

hk 
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Now if a,t may be put in the form Jkf + + 7 a with S -8; = 7 a = o 

1 k 

then iti is distributed as in (11) with / = (n — l)(s — 1). If, m addition to the 
additive assumption, Vk ~ 0 for all values of k then follows the same law 
independently of Ui , and with / = s — 1, Similarly if B, = 0, for all values 
of j, Ms has the same distribution law with f = n — 1. It may be shown [16] 
that if ajk == M for all values of j and k the three quantities Ui , Ma and Uz follow 
independently the law (11) with suitable values for/, and then the corresponding 
values of z follow the law (12). 

Making the assumption of additivity for a,* , of which, incidentally, the 
correctness or adequacy cannot be tested without more than one set of ns 
observations of the variables, the 2 -test may be used to determine whether or 
not there is a “block effect” or a “variety effect,” i.e., whether J5; = 0 or 7i = 0 
for all values of j and k. For instance to test the hypothesis 7* = 0, IS; = 1, 
2, • • ■ ,fi, 2 = 5 loge i(n — 1 )u 2 /mi) is calculated from the observations and 
the hypothesis is rejected if 2 > 2 o where zo is found from Fisher’s tables corre- 
sponding to a suitable value of « in (13). Otherwise the hypothesis is accepted. 
This is the usual method of applying the 2 -test to randomized blocks. 

The problem before us now is to consider what happens to such a test when 
(Tm and p*m 0 in (10), and the hypotheses to be tested must be related 
to (1) and (5). As already stated this method of testing hypotheses controls, 
at a suitable level, the risk of rejecting the hypothesis when it is true. A 
complete examination of the application of any criterion as the test of a sta- 
tistical hypothesis should involve, also, investigation of “errors of the second 
kind,” i.e., the risk of accepting the hypothesis when some alternative is true. 
That is to say such an examination should involve a study of the “power func- 
tion of the test” [17, 18, 19], and this would require a knowledge of the proba- 
bility distribution of the criterion when the hypothesis tested is not true. In 
this paper, however, attention will be confined entirely to "errors of the first 
kind.” 

6. Hypotheses Tested. In order that 
(14a) Ml = £ (y,* - y.t - y,-. -f y .f 

uk 

and 

(15a) M 2 = M (y.t - y..Y 

k 

may be true it is sufficient that 

fli'A — a, A — a,, -t" a . = A^,.(a) — A .<*) — X,'.(.) -f X..(.) 
and 

a. A — a.. = X.,(*) — X. ( ) 
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must both be zero in every case. It has been suggested by Neyman [4] that it 
would be desirable to test the hypothesis that is independent of k, i.e. 
that the average of the true yields over the whole field is the same for all 
varieties. He suggests that the variations in the responses of the different 
varieties within the field are relatively unimportant so that, while allowing for 
the effect of the variations in fertility within the field on the various distribution 
laws, it is the average over the whole field which should be tested. The func- 
tions Ml and Ui will not test this hypothesis for, in order that they may have 
the same expectation not only must X be independent of k but also X, (j,) 
must be independent of k for every j. Consequently one of the hypotheses 
tested here is that X, — Xj <.) , and therefore, of course, X..^|e) = X c ) , for 
every j and k, i.e. that the mean of the true yields over all the blocks is the same 
for all varieties while, by using (10), we make allowance for the variations in 
fertility over each block and for the resultant correlations introduced. We shall 
not consider uz from (16) as we are interested only in the presence or absence 
of a "variety effect.” 

It appears that two other hypotheses lead to results which are particular cases 
of the above. If we test whether the true yield on every plot is the same for 
all varieties, i.e. that Xj^k) is independent of k, then, assuming the hypothesis 
tested is true, the varieties all react in the same way to the variations of fer- 
tility within each block and in (10) «■* = o-m = <r, say, while pkm = p- On the 
other hand if we neglect all the variations in fertility within each block all the 
correlations vanish and <rk = (Tm = The hypothesis tested then is that 
either X,nk) or, what is the same thing, is independent of k. 

It does not appear that the assumption of normality need cause any difficulty. 
E. S. Pearson [20] has examined the effect of skewness on the parent popula- 
tions and by carrying out sampling experiments has concluded that even with 
skew populations ". . .it seems probable that the more elaborate forms of 
analysis of variance are also of fairly wide application, provided that the number 
of degrees of freedom apportioned to the residual variation is not too small.” 
A further investigation by Eden and Yates [21] was also designed to test the 
effect of skewness, but the negative result there obtained was to be expected 
owing to the amalgamation of the observations into groups. It appears that 
the effect of skewness in the original populations will not have very much 
effect on the distribution of z. 

Welch has examined [22] Randomized Blocks and Latin Square experiments 
from the “randomization” point of view. In the case of randomized blocks, 
in terms of the notation used above, he has taken i,k = 0 or, expressed in 
another way, he has assumed that the actual observed yield in any plot is the 
“true yield” on that plot of the particular variety tested on the plot. The 
hypothesis he is then testing is that , or, what is the same thing for him 
x,nk) is independent of k. Taking the (s!)" different ways in which the varieties 
may be tagged on to the different yields he has considered the (s!)" different 
values of what we have called, w and he has compared the finite discrete distri- 
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bution so obtained with that given by normal theory. Getting E{v)) and 
from the finite distribution, he fitted a Pearson Type I curve in a number of 
examples and found that the 5 per cent and 1 per cent points in his fitted curves 
did not differ much from the corresponding points of the normal distribution 
of w. His theoretical discussion showed, however, that if there is too much 
discrepancy between the variancies in the different blocks the randomization 
test may seriously underestimate the significance of any differences between 
the varieties as compared with normal theory. 

It was Neyman [4] who first pointed out that, when the variations of fer- 
tility within each block are taken into account, the correlations between the 
observed yields should be allowed for, and the method adopted here is a de- 
velopment of his point of view. A number of authors, however, while agreeing 
that such variations of fertility do occur, hold that this does not seriously affect 
the distribution of »■. 

6. Distribution of Ui and uj . As already stated, it is the distribution of 
V = Ui/ui which will be sought, not that of z, where Ui and U 2 are defined by 
(14) and (15), or rather by (14a) and (15a), since the hypothesis tested is 
assumed true. Writing i = the characteristic function of the simulta- 

neous distribution of Ui and u-i , that is [exp {f(/iWi 4- kUi)]], is found from (10). 

From (14a) and (15a), ^ straightforward expansion, using the conventions 
already explained for 23 , 22 etc., we get 

jb,m A'Hm 

Ui = (ns)~^ [(n - l)(s - 1) 23 yU - (n - 1) 23 2 y,kyim 

i,k y 

— (s — 1 ) 23 23 Vikyik + 23 23 vikyim ] , 

k ]\l j\l k\m , 

U 2 ” (ws) [(s ■“ 1) {^^3 Vjk Pjkyik} jS Vikym 1 

jtk k )\l ; *Vn» 3\l h\m 

And using these expressions with (10) the characteristic function of ui and U 2 is 
k) = f plyjt} -exp \iikui + t 2 U 2 )] dY 

J—oo 

_ f exp {23 23 B,k,imy,kyiv>] dY 

v-oo k,m 

where dF = dy,* and the integral is an ns-fold one taken over the whole space 

of these variables. is defined by 

(17) Bjt,^im ~ l)[^i(ni,/ 1) 4" 

where the S’s have the usual meaning being equal to 1 when the suffixes are the 
same and equal to zero when the suffixes are different. This integral, since the 
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real part of is positive definite, may readily be evaluated [23, 24] and 

gives 

(18) , U) = A*7B‘ 

B being the ns-rowed determinant | j. 

The determinant B may be -written in the form 


B = 


[P\ [Q] ... (<21 
[Q] [P] ... [Q] 


[Q] [Q] ... tP] 


where [P] = [pkm] = [Bjk,,v\ and [Q] = and there are n® such 

arrays in B. This gives at once 5 == ) p** + (w - l)g*„ M 
whence on substitution 


(19) B = Ui* - - l)/s ]• 1 - l)/8 l"-\ 

The two determinants in (19) are identical, with <i and <s interchanged, and 
are readily reduced to symmetrical (s — l)-rowed determinants by: (a) Adding 
to the terms in the last row the corresponding terms in the other rows and 
repeating for columns, (b) Multiplying the terms in the last row successively 
by Mt/Af (fc = 1, 2, 3, • • • , s) and subtracting from the corresponding terms 
in each of the other rows, with 

(20) AT* = E and M = E Atn.. 

fn«l 

The following operations then reduce these (s — l)-rowed determinants to ones 
which are symmetrical and contain <’s only in the diagonals : (i) To the terms 
in the last column add the corresponding terms in all the other columns and 
repeat for the rows, (ii) Multiply the terms in the last column by ( V® + 1)~^ 
and add them to the corresponding terms in each of the other columns, repeat 
for the rows, (iii) From the terms in the last column subtract the sum of the 
corresponding terms in the other columns multiplied by s”*, repeat for the rows, 
(iv) Divide the last row and the last column by The determinant then 
becomes M/s. \C — HI 1 where 1| C H is the matrix |1 ll, 7 the unit matrix 
and 


Ckm = Cmk = Aim “ (Vs + 1) + Ama) + (\/s + 1) *A„ 

( 21 ) . ^ ^ , 

- M-^lMk - ( Vfi + 1)“^^.] - (v^ + l)"*^^^.] • 

It should be noted that henceforward k and m run through integral values 
from 1 to s — 1 only unless the contrary is specifically stated. 

Thus it follows that 

<Pvuu,(.k , U) = (Aa/Jif)*" 1 C - iki I • 1 C - iUl r‘. 
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Putting C = I Cta 1 and noting that Vui.utik , fe) = 1 when k = ti - 0 clearly 
As/M = C and the characteristic function factors into the form <fiuiik) ■</>«, (k), 
where 


(22) <pu,{k) = I C ~ iki I 

(23) <(>u,{h) = I C - iW 1 


This demonstrates that % and Ui are stochastically independent and that the 
correlations introduced by allowing for the variations in fertility within the 
blocks does not affect the independence already demonstrated [16, 25]. 

II C II being a square positive definite matrix of rank s — 1, its characteristic 
equation | C — XZ | = 0 must have s — 1 real positive roots. It follows that 
I C — iti I must factor into s ~ 1 factors of the type a-it where a is a real positive 
constant. Some or all of these factors may be equal and various combinations 
of factors of different multiplicity are possible depending on the value of s. 
Only two cases will be considered here: (a) when all the roots of the characteristic 
equation of |1 C t| are equal, and (b) when all the roots of the characteristic 
equation are unequal. 

Suppose that all the roots of the characteristic equation are equal, say to a, 
then I C ~ ftZ 1 = (a — and C == a*“* giving 

(24) 

(25) 


It is seen at once that Mi and are distributed as in (11), fi~{n~ l)(s - 1) 
and/s = s — 1, and thus z or r follow the usual distribution laws. 

Clearly when the variations of fertility within each block are neglected and the 
hypothesis tested is that X,nk) , or X, (*> , is independent of k, the roots of the 
characteristic equation are all equal. Then there is no correlation, o-* = cr, , 
Akk = (2cr’)~‘ = Ckk = a, Akm = 0 = c*„(fc ^ m) and the usual results are 
obvious. 

On the other hand when allowing for the variations of fertility within a block 
while testing the hypothesis that Xmk) is independent of k, the variances and 
covariances are all equal, i.e. cl = + al = c^, pkk = 1 and ptm = p = 

-a*/{(s - l)((r* + <rj)), k^m. This gives 


Akm - [{1 + (s — l)p}5*m — p](l ~ py 
Akm = (1 + (s — l)i/l!m "■ p)/2(7*A, 

Cjim ~ 5tm{2ff (1 p) } , 


where, as usual, 5*, 


_ ri k = m 
~ 1_0 k^m_ 


A= f 1 + (s - l)p)(l - py~\ 
A = l/(2ff )'A, 

(7 = {2/(1 - p)r^\ 

From this it follows the roots of the char- 


acteristic equation are all equal, a = {2/(1 — p))“' in (24) and (25). Thus in 
this case also, the z-test or its equivalent gives exact control of errors of the first 
kind. There is, however, this difference that «i/(n — l)(s — 1) and ua/(s — 1) 



APPLICATION OP Z-TEST TO BANDOMIZED BLOCKS 


349 


are to be considered not as estimates of a but as estimates of ^*(1 - p) = 

(s - + (s — 

When s = 2, even though the variances differ, since there is only one root of 
the characteristic equation a = (<rj + <rj — 2po-i<Ti)~^ the characteristic functions 
are of the form (24) and (25). Consequently, in this case, s - 2, when only two 
varieties are tested for the hypothesis that their average “true yields” are the 
31 ie on each block then, even though the varieties may react in different ways 
to the fertility levels within the blocks, granting normality, the usual z-distribu- 
tion applies. This, of course, includes the case when even though p may be 
zero the variances differ. «i/ (n — 1) and its are to be considered as estimates of 
+ <rs “ 2p<ri<rs). 

Proceeding next to the case in which all the roots of the characteristic equation 
I C - X/ 1 = 0 are unequal, the roots are, say, < aj < ■ ■ • < where, of 
course, all these quantities are real and positive. This case will arise in testing 
the hypothesis that the jdeld for each variety is the same for every block, that 



= Xf.(m), while allowance is made for the different responses of the 
varieties to the differences in fertility within the blocks. The mathematical 
formulation would be the same even if there were no correlations but the vari- 
ances \/ere different for the different varieties. Then we have [26, 27, 28] for 
both ui and ii* from (22) and (23) 

(26) p(w) = C’"(2ir)-‘ f e-“‘[n («* “ 

Jb 

with m = ^(n — 1) for Ui and m = J for «s . 

Replace t by the complex variable z and integrate round the contour shown in 
Tig. 1. This contour consists of; (i) The real axis from -j-J2 to —R, where 
R > a,_i , (ii) The quadrant ] z j = R, ir < arg z < 3ir/2, (hi) The imaginary 
axis from A[—iR] to B[— (ai — r)t], cutting out the singularities by small semi- 
circles of radius r, as shown, (iv) The imaginary axis from B to as in (in), (v) 
The quadrant | z [ = B, 35r/2 < arg z < 2?r. Within this contour /(z) is analytic 
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and hence the contour integral zero. It may also be readily seen that the 
integrals over the two quadrants tend to zero as R increases, and by examining 
the changes in the amplitudes of («* — h = 1, 2, • • ■ , s — 1, as the 
contour circles the points —ia\ , — fa 2 • • • , it will be seen that the integrals over 
the straight lines between (— /«s, — taa), (— fai, —ias), ■■■ cancel whether m be 
half an odd or half an even integer. Then 

(26a) p(u) = C’"(2t)-‘ D jf (a, - iz)]-"*. 

The contours D, are those shown in Fig. 2 and consist of “dumb-bells” encircling 
the points (— ta, , r = 1, 3, 6, - • < , in the negative direction. If s is 

even, the last integral consists of only one half the “dumb-bell” extending to 



Fig. 2 


— foo. It may also be noted that if n be odd and so m an integer, the other 
straight line integrals, those of the “dumb-bell” contours in question, also cancel 
and leave only the contributions of the small circles about, what are now, 
the poles. 

1^1 

Now we put iz = w and $,(«) = 11 («* ~ u))”” omitting the terms coniaivr 

fc**! 

ivjg a, and «,+! , it follows that is analytic in and on a circle of centre 
i(a, -|- ttr+i) which contains the points or and ar+i but not ur-i or ar+ 2 . Thus 
the function ^,{w) may in the interior of this circle be expanded as a uniformly 
convergent series in terms of + Or+i) — w giving, r = 1, 3, • • • , 

00 

^r(«) = S arp{i(ar -|- «r+l) - w}”, 

P-0 


(27) 
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Since termwise integration is then permissible it is necessary to consider only 
integrals of the form 

(K«r + — 'wYe~'"° dw 

Jor {(a, — «))(a,+i — in)}' 

where Dr is a contour similar to Dr but circling instead in a positive direction the 
points ar and ar+i on the real axis. We then have 


' rp 




( 26 b) 
Now if 


p(.u) -(rissryz 'La,. J„. 


- • L 


r p—0 




{(flr — w)(a,+i — m)}"* 


it is clear that Jrp is obtained by applying the operator < |(ar + ar+i) 4- ■ 

I 9m, 

p times to Jr • Now putting lo — § (a, + «r+i) = (wr+i — ctr)t, it follows that 


Jr = 




-Ju(afr+ar+l) 


f 

*'oo 


(14-,— 1+) — iuK<*r+l~*r) 


- 1)« 


dt 


e^^^iKar+i - ar)}**-‘ 
and this gives [29, p. 171], 

T - - ar)] 

2”'“*r(m){i(ar+i - ar))”-* ‘ 

Ip(z) is the Bessel Function of purely imaginary argument defined, — ir < arg 
* ^ iir, by 


(28) 


J^(z) = 22 


Hence it may be found that 
2xr(^) 


ror!r(M + r + 1)’ 


Jrp = ” «r)}] 

r(wi)(ar+l — ar)™ • 9 m’’ 


and this gives 


(29) p{u) = 


D™r(i) 

r(m) 


.-)«(or+“r+l) “ 


Y ]Corp— [M’"“*Jm-l{iM(ar+l “ Or)}] 

(ar+l — ttr)™^^ Sw” 


where Orp is defined by (27) with m = Kn — 1) for wi and m = i for wz ■ 

In the case e = 3 there is only one “dumb-bell” contour and i>i(M)) = 1, so 
that we get, for , W 2 > 0. 

(aia2)‘''‘-”r(i) 


(30) p(ui) = 


(as - aO^-^rlKM- 1)( 

(31) p{u2) = (aiaj)* e-l“='“’+“=> { ^M2(a2 - ai) } , 


e-‘“‘'“*+">)M{("~“>Ji(n-n{K(«* - «i)) 
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It may be noted that if the series in (28) is substituted in (30) and (31) these 
distributions may be considered as the sum of an infinite number of x^-distribu- 
tions all with the same <r^ but with different degrees of freedom. It may also be 
noted that with ai = ai all the terms, except the first, vanish and thus a single 
x^'-distribution is left. 

When s is even the last contour is one from + circling negatively. 
Using Hankel’s integral for the Gamma-Function, putting w — = f/w 

/»(aa-L+) ^(0+) 

J = t e-““(«._i - wr^dw = 

Joo Jcc 


2w 

r(m) 


Denoting by D differentiation with respect to u under the sign of integration 
and by D~^ the corresponding integration from zero to u. 


D-”! = i / (-u,)-^e-““(«._i - w)-”‘dw. 

• eo 


Then we can write 


n (a* - la)'” = (-w)-'‘-« 


a~2 

n 


(1 - «,/w)-” 


= i: 

P-0 

the expansion being justifiable since ! ak/w | < 1. Since (s — 2)m is an integer 
the additional term to be added to (29) to give p(w) is, therefore, 

(32) C"/r(m) ■ £ . 


7. Distribution of d = uj/ui . Though the distributions of uy and Ui have 
been given in a rather complicated form for any value of s when the roots of 
1 C — XJ I = 0 are all unequal, the distribution of v is given only for s = 3. 
In this case, since Wi and Ui are independent, from (30) and (31) 


p(ui, ih) = 


(aia2)‘"r(i) 


(aa — ai)*^" ®r{i(n — 1)} 


g-J(ui+uj)(ai+a2) 


/o{^Ma(a2 — ai)}/i(n-2){|Mi(as “ ai)) 


Now making the transformation «i = u,ut = uv with 
the .exponential term 


d(ui, un) 
9(m, a) 


== u and putting 


— Ju(H-w)Caii+a2) {w( 1 -f v)(ai 4- aj)}*' 

m) 


-f y)(«i + aj)}, 
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then OR integrating with respect to u over the whole range of variation, from 
2ero to infinity, we get 


_ (aia!2)*"((l + y)(o'i + f' 

" (a, - - 1)} Jo 


- ai)} 


( J-M(a2 — ai))At{|M(l + v)(ai -f 0:2)) du, 

Ki(z) being the modified Bessel Function of the second kind. 

This integral is a particular ease of one investigated by Bailey [ 30 , 31 ] and it 
gives 

(33) pW = + + ^)'’.i3V(l + t^)^l 

where y 3 = («2 — a0/(«2 + «i) and F4 is Appell’s fourth hypergeometric function 
of two variables [ 32 ]; 

On performing a similar integration when s > 3 , p(v) may be obtained as a 
rather complicated series of terms similar to ( 33 ). 


8. Approximate Moments of the Distribution of v As the distribution of v 
is complicated even in the simplest case of s = 3 it appears advisable to examine 

j-i 

its moments even though only approximately. Writing Sr = and putting 

k—l 

n (1 - i/c‘k)~’^ = exp £ log (1 - f/a*)) 

A-I t *-I J 

= exp mSrt^ /r^ , 

kr being the r-th semi-invariant of u, we get 

kr = Srm-(r — 1)! 

Thence the first four moments of u about its mean are 
U = mSi , M2 = ^82 
fi3 — 2mS3 , fti = 2m{2Si + mSl], 

m being ^{n — 1) in the case of ui and | in the case of U2 . 

Now to get approximate moments for v we write f == (mi — wi)/wi and )j = 
(m2 — U2)/u2 and, expanding in terms of f and Vi obtain 

V = M 2 /mi- (1 -h 2 ? — f — ?>? + + £% “■ • • ■ } 

This gives, Mr being the r-th moment of v about the origin and writing Tr = 

Sr/S[==Zc^r /Ctat)\ 

k-l / / 

Ml = (n - 1)“'[1 -b 2 T 2 (n - 1)"* - STafn - 

-f 12{47’4 + (ra - i)r?l(n - 1)“’ • ■]> 
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JWb = (n - 1)~'(1 + 2 T 2)[1 + QTiin - 1)"' - Z2Ti{n ~ l)-= 

+ 60 { 4 T 4 + (n - l)r^)(w - i)-“ ...]_ 

lf3 = (m - m + 8r3)[l + \2Ti{n - 1)~* - 80r3(n - I)"' 

+ 180{4r4 + (R - l)rlj(n - l)-^..], 

M4 = (n - 1)“*|1 + I2T2 + 32T3 + 12(47’4 + Tj)}!! + 20r2(« - 1)-' 

- 160T,(w - 1)-" + 420{4!r4 + (n - l)T\){n - 1)"= . . .], 

The moments around the mean may readily be found if needed, If the a’s 
are all equal 

_ r(^/i - r)T{hS 2 + r) 

from the known distribution of the ratio of two x^’s with /j = (s — 1) and 
fi = (n ~ l)(s — 1) degrees of freedom respectively. Then developing ilfr a,s a 
series in terms of/T^ and/T^ 

M[ = (/3//o(i + 2/r^ + 4/r^ + . . .), 

M 2 = (/3//i)'(l + 2/2-‘)(1 + 6/i-‘ + 28/r“ + • . •), 

M[ = (/3//:)’(i + 6/i-* + 8/r')(i + 12/r' + ioo/r= +•■■), 

M[ = + 12^-‘ + 4^^ + 48/r’)(l + 20/r' + 260/x-^ +...)• 

It is then easily seen that the difference between these moments and those of v 
when the a’s are unequal is due to the deviation of T, from (s — 1)’’“^ the value it 
would have if the a’s were all equal. 


9. Numerical Illustrations. The distribution of v has been obtained in 
workable form only when s = 3 and, consequently, it is only that case that is 
considered here. In equation (33) the variable terms in the Appell function all 
contain /9 = (aj — ai)/(as + m) and it is this fraction or, perhaps better, its 
square which measures, in a sense, the deviation of the distribution of v from the 
usual form. There are, therefore, two stages in this examination. It will first 
be investigated how fi changes with the correlations and variances ; and then the 
changes in the “levels of significance” due to differences in p will be examined. 

Using (9), (20) and (21) it will be found that the equation ] C — X7 1 = 0 for 
s = 3 becomes 4p\‘ — iq\ ■+• 3 = 0 with 


p = (ricaAii -j- (TsCTiAja (fiffsAsa -}- 2(ri<7’2(r3((riA23 -)- aiAsi -f" o'sAis), 
* I * _L * 

q = <ri -f- a -2 -f- O’! — (r2<f3P23 — <r3<riP3i ~ <ri<r2Pis , 


where, of course, A = | | and A*™ is thc> cofactor of pkm in A. This equation 

may readily be solved giving ai and a2 • 

Taking first the case of zero correlation and putting trl = kia^, (tI = and 
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it will be found that while ai and oj depend on the k's and on the 
fraction d = («s — “i)/(«2 + ai) depends only on the k’s. For different values 
of the k’s the following table shows the values of 


TABLE 1 

Values of ^3“ of different values of al : a% : a\ ~ kx -.kn \ kz . No correlation. 


h 

kt 

h 

|3‘ 

1 

1 

1 

0 

1.0 

1.1 

1.2 

0.003 

1.0 

1.5 

2.0 

0.037 

1 

2 

3 

0.083 

1 

4 

4 

0.111 

1 

9 

9 

0.177 

1 

25 

25 

0.221 

1 

1 

4 

0.250 

1 

4 

9 ' 

0.250 

1 

9 

16 

0.250 

1 

1 

9 

0.529 

1 

1 

25 

0.790 

1 

N 

N 

(iV - l)V{2iV + 1)“ 

1 

1 

N 

(N - ly/iN + 2)» 


It is clear that to get a considerable value of /3^ one standard deviation 
must be a,t least three times the other two. It also seems to produce a 
considerably larger value of to have one large k and two small k’s than to have 
two large ones and one small, with the same order of magnitude of the ratio 
large to small. Furthermore, when the ratios of the are 1:1 lAf the limit of 
^ s&N increases is 1, while if the ratios are \ \ N '.N the limit is 0.25. 

Examining now the definition of , omitting the j's in equation (7), we find 
that pkm can be written in the form 

Pkm - -r*m(s - ir^[(l + <r!/<rjj(l + 

where 

0 _ 

s"* S MjKt) 

Tkm = 

and ricm is itself a coefldcient of correlation, i.e., the correlation between the 
true jdelds. The second part of pi« depends on s and on the relative magnitudes 
of the soil error and of the technical error. Its maximum value, in the case of 
s = 3, is 0.5 which occurs when the technical error is small with respect to the 
soil error. If both types of error have the same variance then the second term is 
0.25, There appeare to be no data available which enables us to assign values to 
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Tkm , SO the method adopted is to choose some values of m. which appear likely 
to aSect seriously the value of /3“ and then to take the second factor equal 0.5. 
If the values of pkm are all equal and the variances are also equal the normal 
theory has been shown to apply, and hence these values are taken to differ 
considerably. Table 2 shows the effect on /S* of taking different correlations with’ 
various values of Cl \ Ci \ cz . 

It is clear from the table that if there exist correlations of the order of magni- 
tude of those assumed, they can cause the distribution of v to deviate considerably 
from that which arises on the usual theory. For instance, if the variances are 
equal the value of /3* may be 0.444 a value it would attain if, with no correlations, 
one variance was seven times the other two. Taking the cases in which cl ; 
(Ta : £r| = 1 : 4 : 9 or 1 ; 9 : 16 the value of ^ with no correlations is, in either case, 
0.250 while with the correlations it may be as low as 0.008 or as high as 0.869. 

TABLE 2 


Values of ff-for different values of the correlations and of cl : cl: cl. 


PIJ 

PU 


2 . 2 , 2 

P2a 

1:1.1 

1:4:9 

1:9.16 

1:25:26 

1:1:25 



0 

0 

m 

0 

0.250 

0.250 

0.221 

0.790 


-0.1 

bei 



0.083 

0.075 

0.059 

0.721 


-0.1 

-0.4/ 

1 

0.523 

0.549 

0.543 

0.851 


-0.25 

bBB 


J 

^0.132 

0.104 

0.056 

0.766 


-0.25 


1 

0.423 

0.429 

0.402, 

0.843 


0.2 

-0.4\ 

0.265 

J 

^0.706 

0.698 

0.658 

0.909 


0.2 

0.4J 

1 

10.028 

0.019 

0.020 

0.690 


0.4 

-0.4\ 

0.379 

J 

^0.793 

0.765 

0.709 

0.937 


0.4 


i 

10.016 

0.016 

0.038 

0.673 


0.4 

-0.5/ 

0.444 


10.869 

0.845 

0.793 

0.954 


0.4 

0.4/ 

i 

10.008 

0.010 

0.042 

0.664 


O.I 


0.189 


0.606 

0,596 

0.551 

0.890 


0.1 



/0.055 

0.035 

0.012 

0.721 


On the other hand when is large, in the case of zero correlation, say /S'* = 0.790 
when ff? : : (Ta = 1:1: 25, the correlations, as might be expected, appear to 

have less effect, the values of varying from 0.654 to 0.954. We may, therefore, 
conclude that if such correlations exist their effect on and therefore on the 
distribution of v, is certainly comparable with that of fairly large differences in 
the variances. 

We now examine 

p{v) dv 


P{t) > Wo 1/3} = j 
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for different values of Vo and j8. Writing p{v) in full from (33) and interchanging 
integration and summation we get 


P[v > Vo 1/3) = (a - 1)(1 - 0')*" £ 

ik-o u'rk\ 


•’va 

Changing the variable to as = (1 + »)“' the integral part becomes 

r a _ r(2; + l)r(2A; + „ - l) ^ 

, (l + «) r(2,- + ffl + n) L.(2fc + n- 1 , 23 + 1 ), 

in the notation usually employed [15], Substitution gives 
Pfv > Vo 1/3} = (1 - E 


j,Jb— 0 


2^ij\yk\ 


+ n _ 1, 2; + 1). 


Two sets of values of this expression were obtained, one for n = 3, and the 
other for n = 6, while was given the values 0.1, 0.2, 0.3, 0.4, and 0.5. The 
values of Xa were chosen so as to cover the 1, 5 and 10 per cent significance levels. 
Table 3 shows these results. 


TABLE 3 


P(v > vo//3| for certain values of vo and d 
(a) n = 3 


Xo 

Vo 

Values of /9* 

0.0 

0.1 

0.2 

0.3 

0 4 

0.5 

0.05 

19 


0.003 

0.003 




0.10 

9 


0.011 

0.013 



HRS 

0.15 

5f 

0.0225 

0.026 

0.027 

BipmiI 



0.20 

4 


0.043 

0.048 




0.30 

2§ 


0.095 

0.100 


0.112 

0.117 

0.40 

u 

BIH 

0.165 

0.170 

0.175 

0.181 

0.187 


(b) n = 6 


Xo 

Vo 

Values of j8* 

0.0 

0.1 

0 2 

0.3 

0 4 

0.6 

0.3 


0.002 

0.003 



mi 


0.4 

H 

0.010 

0.012 

HR9 

HR9 

IB 


0.5 

1 

0.031 

0.035 


0.043 

IB 


0.6 

2 

J 

0.078 

0.082 

■nRiaR 



mm 

0.7 

i 

0.168 

0.171 

0.173 


0.181 
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The 1, 6, and 10 per cent levels of significance for xa were obtained in both 
cases by graphical interpolation and the corresponding values of vo then cal- 
culated. Table i shows clearly the change.s in these significance levels. It 
must be remembered that values of ^ considerably in excess of 0 6 may easily 
arise. 


• TABLE 4 


Changes in the levels of significance j8^ = 0 and 0.5, n = 3 and 6. 


1 

1 

1 

per cent 

5 

per cent. 

10 

per cent, 

To 

Vt 

1 

a:o ! 

Vo 

Xa 

Vo 

» = 3 1 
n = 6 j 

= 0 

i/9^ = 0.5 1 

= 0 
= 0.5 

0.10 

0.07 

0.40 

0.32 

9.00 
13.0 

1.51 

2.1 

0.22 

0.18 

0.55 

0.50 

3.47 

4.6 

0.82 

1.0 

0.32 

0.27 

0.63 

0.59 

2,16 

2.7 

0.58 

0.7 


This work shows quite clearly that the effect of any correlation between the 
yields, such as that introduced by variations of fertility within a block, or of any 
difference between the yield variance of different varieties tends to cause a 
significant deviation to be recognised when, in fact, none exists. When the 
number of varieties tested is three, the variation in the levels of significance 
may be quite large. 

10. Conclusion. The mathematical model appropriate to Eandomized Block 
Experiments is examined and it is suggested that the use of the z-test, as ordi- 
narily applied, is theoretiqally justifiable only when the variations in fertility 
within each block are negligible. 

Correlations between the yields of the varieties, due to randomization in a 
limited set, are introduced when the differences in fertility within each block are 
allowed for. 

It is suggested that, as a first approximation, a raultinormal population may 
be used for the yields from a given block, the variances and correlations being 
assumed equal from block to block, though the means, of course, differ. 

The simultaneous distribution of the usual sums of squares is found in this case, 
and these sums of squares are shown to be independently distributed as the suras 
of squares of s — 1 and (a — 1) (s — 1) quantities from another multinormal 
population. 

It is shown that the usual distribution results apply when the variances and 
correlations of all the varieties arc equal as well, of course, as when the variances 
are equal and the correlations zero. It is also shown that the same is true when 
the number of varieties is two, though the variances may differ. 

The distributions of the above sums of squares are obtained for all values of s. 
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the number of varieties, and the distribution of their ratio for s = 3. The 
method of obtaining the distribution of the ratio for s > 3 is also indicated. 

The relative importance of the deviations from the usual distribution produced 
by differences in the variances and differences in the correlations is examined 
when s = 3, and it is found when the variances are all equal that the latter can 
produce deviations comparable to one variance being seven times the other two. 

That the presence of the correlations or of non-equality of the variances causes 
a tendency for a significant difference to be found when none exists is clearly 
shown. 

In conclusion, I must express my gratitude to Prof. J. Neyman, now of the 
University of California, for suggesting this problem to me, to Dr. R. C. Geary 
and to Prof. S. S. Wilks for valuable suggestions. 
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ON THE DISTRIBUTION OF ERRORS IN N*’' TABULAR DIFFERENCES’ 
By Arnold N. Lowan and Jack Laderman 

In the construction of mathematical tables, a frequent method of checking 
the computed values of the tabulated function is to apply a differencing test, 
This test consists of computing the tabular differences of some suitable order n 
and comparing them with the theoretical values of the differences computed to a 
higher degree of accuracy by an independent method. Whenever the absolute 
deviation of a tabular difference A" from the corresponding theoretical dif- 
ference A" exceeds some predetermined upper bound, the entries giving rise 
to the difference in question are investigated. Thus, in the computation of the 
functions Si{x), Ci{x), Ei{x) and Ei{—x), it was found desirable to check the 
final manuscript by comparing the tabular second differences with the values 
of the second differences computed to a higher degree of accuracy by an inde- 
pendent method.’’ 

A study of the distribution of errors suggested the following problem; If we 
assume a rectangular distribution of the errors in the entries of a mathematical 
table, what is the distribution of errors in the n**' tabular differences? 

For the sake of mathematical simplicity, it will be convenient to idealize the 
problem as follows : Consider n + 1 randon numbers Xo , xi , xt , < x„, drawn 

from any rectangular distribution. When arranged in a definite Order, these 
ra 1 values give rise to an n*'’ difference A". If these n + 1 numbers are rounded 
to h decimal places, the new approximate values xo , , • • • , will give rise to 

another n‘'‘ difference A". 

We shall investigate the distribution of the error A" - A", 

The explicit expression for A" — A" is given by: 

A" - A" = C?F?n - CrE„_i + + 

= uJo + uji -b ti!2 + • • • + tfnCsay) 

where E, = x, — and C" are the binomial coefficients. 


‘ The results reported in this paper were obtained in the course of work done by the 
Project for the Computation of Mathematical Tables. 0. P, No. 765-97-3-10. Work 
Projects Administration, N. Y, C,, operated under the sponsorship of Dr. Lyman J, Briggs, 
Director of the National Bureau of Standards. The authors wish to express their appre- 
ciation to the W. P. A, and to the Sponsor of this Project for permission to publish these 
results. 

‘ The above functions were computed for x = 0(0.0001)2.0000 to 9 places of decimals and 
X * 2(0 001)10 000 to 9 decimals or significant figures For the independent method of 
computation of the second differences, see article by A N Lowan in the Bulletin of the 
American Mathematical Society, August, 1939. 
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The distribution of any one of the E's is 


f(B) dB = 


10* if -UO"* < S < ^10"* 

0 elsewhere. 


The subsequent developments are based on the fundamental theorem which 
states that the characteristic function of the distribution of the sum of any 
number of random variables is the product of the characteristic functions of the 
distributions of the individual variables.’ The characteristic function, git), 
of f{x) is defined as follows: 

(1) g(t) = j_^e'‘yix)dx. 


As is well known, the inversion of (1) is given by: 
(2) fix) = 


It can be easily seen that the distribution of W{ is : 
/(w.) dw, = 


10* io~*cr . . 

-^dw>, if — < w. < 


0 


elsewhere. 


10"* C," 
2 


and its characteristic function is: 


_ sin KlO"*Cr0 

Kl0“*(7r0 


On the basis of the theorem, above mentioned, the characteristic function of the 
distribution of A" — A" = j/ (say) is: 


(3) 


Git) = n 


1-0 


sin Kl0~*C'.'‘t) 

Kio~‘ cro 


The desired frequency function, Fiy) is given by the inversion of 

Git) = r F'^Fiy) dy. 

From (2) and (3), we get: 




-tiy sin ^(lO *C|*0 

.U i(io"*cr<) 


di 


* See, for instance, Harald Cram6r, Random Variables and Probability Distributions, 
Cambridge Tracts in Mathematics and Mathematical Physics No. 36 (1937) p. 36, 
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which may be written as: 

in)e(n+-l) /•« » ij 

(4) = — s I cos (2 ty) . n sin (10"* C." t) • . 

«’~0 

The problem now reduces to the evaluation of the integral in (4). In the 
evaluation of this integral, it is convenient to consider even and odd values of n 
separately. 

Case 1. When n is even 

n sin (10-*C?0 = - P„ + P„_, + (-l)i"Pi„+,] 

i ««0 


where Pn+i-r denotes the sum of the sines of n + 1 — r of the angles taken 
positively and the remaining r taken negatively, the negative angles being taken 

n 

in every combination.'* Thus cos (2 ty) sin (lO^^CTO can be expressed as 

i-O 

the sum of products of a sine by a cosine. By employing the identity sin A 
cos -B = |(sin (A + B) + sin (A — B)), each term can be written as the sum 
of two sines. Hence the integral under consideration can be written as the 
sum of integrals of the form 



sin at 

in+l 


dt. 


Integrating by parts n times in succession, we obtain: 


(5) 



sin at 
■^i+r 


dt = 


( — l)^’'a’* r” sin of, 

■“A! 


But 


(6) Therefore 



sin at 
t 


dt 


w for o > 0 

— TT for o < 0. 



sin at 


dt 


n\ 
n! 


for 'a > 0 
for a < 0. 


By use of (6) the integral in (4) can be readily evaluated. 

Case 2, When n is odd. 

n sin IQ-^Crf = [<3„+l -Qn + Qn-l + 

t— 0 2” 

where denotes the sum of cosines of the sum of n -}- 1 — r of the angles 

taken positively and the remaining r taken negatively, the negative angles being 


* See E. W. Hobson, Plane Tngonometry, Seventh Edition (1928) pp. 50-51 
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taken in every combination. As in Case I, the integral in (4) can be expressed 
as the sum of integrals of the form: 


c 


cos at 


dt. 


Integrating by parts, we obtain. 

i-« n jL« t" 

The second member of this equation has been treated in Case I. It follows 
that: 

f (-l)“"+“a"7r 


( 7 ) 


£ 


cos at 


dt — 


n! 

vl(n-l) n _ 


n! 


for 

o > 0 

for 

a < 0. 


By means of the integrals, (6) and (7), F{y) can be obtained for any n. The 
results for n == 1, 2, and 3 are given below: 

n = 1 

10**2/ + 10* for -10"* <2/^0 
F(y) = -10**2/ + 10* for 0<y < 10"* 

0 elsewhere 

n = 2. 


F(y) = 


i^*2/* + 10**2/ + 10* 

for 

-2.10"* <y < -10"' 

10®* 2 , 10* 

+T 

for 

-10"* <y < 10"‘ 

— y® - 10** y + 10* 

for 

10"* <y <2.10" 


elsewhere 


n = 3. 


Fiy) = i 


10 


Me 


m" + 9 


2 . 10 ** 2 . 8 . 10 ** 




y d- 


32-10* 


27 


’ 54 

10 ** 

9 


y - 
y + 


10 ®* 2 
10 * 


10 


,2A 


, 5-10" 
9 


for -4.10"* < 2 / < -3-10 


for -3.10"* < 2 / < -2-10" 


for 


-2.10"* <y< -10" 
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n — 3. — cont. 



10'* 3 

27 ^ 

10“ 2 

+ 

00 

for 

- 

-10~* 

^ 2/ <0 


10“ 8 

10“ 2 

+ 

8-10* 

for 


0 < 



27 ^ 


27 


y < 10"* 

I 

II 

10“ , 
-^J/ + 

10* 

3 



for 

lO"' 

‘<2/ 

< 2.10"* 

{cont.) 

10“ 8 

- 

10“ 2 

+ 

10“ .5.10* , 
y H — for 

9 ^ 27 

2.10“' 

'<y 

^3.10"* 


10“ 8 , 

2.10“ 

2 

8.10“ , 

32.10* 





-■54!' + 

9 

y 

- 5 1/ + 

27 









for 

3.10"' 

'<y 

< 4. 10"* 


0 




elsewhere. 




In general, F{y) is an even continuous function, vanishing for | j/ 1 > 2"”^ 10~* 
and defined by different n**" degree polynomials in different intervals. 

The above frequency functions were derived on the assumption that the x’b 
are random numbers drawn from a rectangular distribution. However, the 
results may be applied to the entries of a mathematical table provided the 
rounding errors are horizontally distributed and the difference under considera- 
tion is of such an order that the digits in the decimal place corresponding to the 
last place given in the table are also horizontally distributed. These conditions 
are frequently satisfied. Since data on the errors in the second differences of a 

r* (JOS « • 

table of Ci{x) — / dx given to 9 decimals was available, a study was 

Jou X 

made of a sample of 1000 of these errors. The theoretical and observed fre- 
quencies for this sample are given in the following table; 


Error 

-2 

to 

-1 6 

-1,5 

to 

-1.0 

-1.0 

to 

- 5 

- 5 
to 

0 

0 

1 to 
.5 

.6 

to 

1.0 

1 

1.0 

to 

1 6 

1 6 
to 

2 


Theoretical Frequency 

10,4 

72 9 


239.6 

239 6 

177,1 

72 9 



Observed Frequency 

9 

68 

|l61 

272 

243 

174 

63 

10 

1000 


By applying Pearson’s x^-test, it is found that the observed frequencies show no 
significant deviations from the theoretical frequencies 


New York 




ON TESTING THE HYPOTHESIS THAT TWO SAMPLES HAVE BEEN 
DRAWN FROM A COMMON NORMAL POPULATION 

By B. a. Lengyel* 

I. Introduction. This paper is devoted to the problem of testmg the hypothe- 
sis that two samples of 2, 3 and 4 variables, and of equal size, have been drawn 
from a common unspecified normal population. It is, in a certain sense, a 
continuation of J. W. Fertig’s papers [1, 2] which were devoted to the problem 
of testing the hypothesis that one or more samples of n variables have been 
drawn from a completely or partially specified normal population. 

For the sake of application to biological research, it is important to have 
means of determining whether two samples may have come from a common 
population even if this population is unknown. Moreover, it is often imperative 
to test two samples with respect to all their variables simultaneously. Much 
valuable information may be lost if the variables are tested individually. One 
has to consider not only the fact that two samples which differ almost signifi- 
cantly from each other in each variable separately might be significantly different 
if the probabilities would be combined, but one has to take account of the 
possible correlations between the variables which are completely disregarded if 
the tests are applied to each variable separately. It is not difficult to imagine 
two samples of two variables with identical means and variances and signifi- 
cantly different correlation coefficients. 

J, Neyman and E. S. Pearson [3] have investigated the problem of testing sta- 
tistical hypotheses in general. They have developed the method of likelihood 
ratios. It is beyond the scope of the present paper to give an account of this 
theory; we have to restrict ourselves to statements concerning the fundamental 
concepts we are going to apply to our specific problem. 

A sample with one variable and of size N can be regarded as a point in an 
A'-dimensional space. The acceptance or rejection of a hypothesis concerning 
this sample will depend on whether or not the point representing the sample is 
contained in certain critical regions determined by the hypothesis and by the 
statistical criterion that is to be applied The choice of the critical regions is of 
fundamental importance; its implications have been thoroughly discussed by 
Neyman and Pearson. These authors found a useful criterion for testing the 
hypothesis that a sample was drawn from a specified member of an admissible 
set of populations by introducing the ratio of the likelihood that the sample 
was drawn from the specified population to the maximum value of the likelihood 
for all populations in the admis.sihle set (Cf. §2). This ratio X can vary between 


* From the Research Service of the Worcester State Hospital, Worcester, Massachusetts 
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0 and 1. The association between values of X and the credibility of the hypothe- 
sis in question is such that the greater the value of X the greater the degree of 
tenability of the hypothesis. X = constant defines a surface in the sample 
space. These surfaces are the contours of the critical regions associated with 
the acceptance or rejection of the hypothesis A hypothesis is rejected as 
untenable if X is so small that 

p(\)d\ < oe, 

where a is some value arbitrarily small, say 01 or .05 and P{\) is the distribu- 
tion of X if the hypothesis is true. 

This method of testing hypotheses is evidently not restricted to one sample 
with one variable, nor is it restricted to simple hypotheses. A simple hypothesis 
is one which is associated with one completely specified population. A com- 
posite hypothesis is one which is associated with a subset of the admissible 
populations. For example, the hypothesis that a sample with n variables has 
been drawn from a normal population with means ai , , a„ whatever may 

be the variances and correlation coefficients is a composite hypothesis. Such is 
the hypothesis that two or more samples have come from a common but un- 
specified population. 

The problem of several samples with one variable was discussed by Neyman 
and Pearson [4, 5]; the problem of several samples with two variables by Pearson 
and Wilks [6]. In another paper Wilks [7] derived formulas for X and the 
moments of P(X) for the most general case of k samples of n variables. For the 
sake of practical applications it is necessary to have tables for the limits of 
significance of X. Such tables have been prepared for samples with one variable 
by Neyman and Pearson and for completely or at least partially specified popu- 
lations and more than one variable by Fertig [1, 2]. The present paper con- 
tains tables for the case of 2, 3 and 4 variables and a common unspecified normal 
population. Since the case of two variables has been theoretically solved by 
Pearson and Wilks we shall have to compare our results with those of the above 
authors who derived the distribution of P(X) but did not compute tables. 

Our procedure is the following: We start with the moments of P(X) as given 
by Wilks and approximate the distribution of X*^^ by a suitable function. Then 
we compute the limits of significance for this approximating function. This 
procedure was originally suggested by Neyman and Pearson and was applied 
with some modifications by Fertig. 

§2 contains the definition of the likelihood ratio X ; §3 deals with the moments 
of its distribution for the case of a common unspecified population. In §4 we 
introduce the approximating distribution y = and discuss the 

determination of the parameters p and q. In §5 we give an independent deriva- 
tion of the formula obtained by Pearson and Wilks for P(X) for the case of two 
samples with two variables and compare our approximation with the exact 


f 
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formula. §6 deals with the determination of the limits of significance and con- 
tains the tables. ^7 is devoted to an example. 

2. Definition of X. Let Ct denote the probability of obtaining a given sample 
from a population jt. C will depend on the parameters of the population and 
the data of the sample. Let be the set of all admissible populations and to a 
subset of n which corresponds to a certain hypothesis that is to be tested^ 
Intuitively one would consider a hypothesis tenable or plausible if it gives a high 
probability density for the given sample if compared with other possible hypoth- 
eses. Following this reasoning Neyman and Pearson defined the UkeUhood 
of a hypothesis as the ratio of Max Cr to Max Cr. In the special case which 

* * u ri D 

we propose to investigate, the populations are assumed to be normal. We wish 
to test the hypothesis that two given samples have come from a common unspeci- 
fied population. Hence X, the likelihood of this hypothesis, is the mayirmim 
likelihood that the samples have come from a common normal population 
divided by the maximum likelihood that the samples have come from any two 
normal populations. 

The value of X can be expressed by the variates of the samples by means of the 
following formula [Cf. [7] p. 489] 



where 5i and St are the generalized variances^ of the samples and So is the 
generalized variance of the sample obtained by pooling the two given samples. 
iVi is the size of the first sample, Nt the size of the second. In case of equal 
samples to which we shall restrict ourselves Ni = Nt = N] thus 

( 2 ) = 

3. The Moments of the X Distribution. The distribution of X depends on the 
number of variables, the number and the size of the samples and on the kind of 
hypothesis that is to be tested; e.g. that the samples have come from a common 
unspecified population. This distribution has been evaluated for the case of 
equal samples of one and two variables and our hypothesis concerning a common 
unspecified population. The general form of this distribution is still unknown 
and even the known formula for two variables is not very suitable for computa- 
tion. Therefore we shall follow the procedure introduced by Neyman and 
Pearson [4] and we shall use the known moments of the unknown distribution 


* The generalized variance of a sample is a determinant, the elements of which ate the 
variances and covariances Thus, for two variables x and y the generalized variance 
S = SJSJd - r*) ; where SI and SI denote the variances of x and y respectively, r the correla- 
tion coefficient. 
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function P(X) in order to construct an approximation to P(X). For two equal 
samples of n variables the moments of P(X) about the origin are [Cf. [7] p. 490] 


(3) 


M,. = 2"^' n <1 

I*"! 




'Nil -^h) -i 





for = 1, 2, 3, • • • . 


Equation (2) readily suggests that we should compute or approximate the 
distribution of X*^'^ rather than that of X. Let m denote the h-th moment 
of then nh = Mh/N follows immediately from the definition of Mj, = 

/ X'‘P(X)dX. Hence in order to obtain the m’s we have to replace Nh by 

X°in (3). 


(4) 






“ 1 

I V 2 / 

\ 2 / i 


(2N + 2h~i\ 

IL \ 2 } _ 

\ 2 )] 


This expression can be much simplified for all given values of K and n. How- 
ever, there is no need for such simplification, because one has to compute the 
first moments only. All higher moments can be expressed by means of the first 
moments for various N’&. The dependence of Hh on N is evident from (4), we 
shall indicate it by writing ixh{N). The ratio of two subsequent moments is 


(5) 


w(iV) 


1-1 


(r (N + h + i -iy 

^ (2iN + h)-i\ 

\ 2 / 

\ 2 / [ 


(2{N + h + l)-i\ 

[1 \ 2 / J 

\ 2 


= Mi(iV + h). 


Equation (5) contains an important relation of the moments. In fact from (5) 
follows: 

M = t,{NUN + 1), 

M3(iV) = niNUN + IMN + 2), 


Um = + 1) • . . mW + A - 1), 

where the I’s from /ii(A’)’s have been omitted. This last group of equations 
holds for any number of variables. ^Thus we have to compute n{N) for each N, 
then multiplication gives the higher moments. 
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For' 

R = 2, 


For 

» = 3, 

M = 

For 

n = 4, 

piN) = 


(N - 2)* 

(N - 1)(N - i) 

1 

{N - ^){N -i)iN - i)- 


(N - 2)^iN - 4)^ 

(N - l)iN - ^){N - 2){N - !)■ 


■m 


4. Approximation of the distribution of Following the procedure of 
Neyman and Pearson we shall use the moments computed in the previous 
section for the fitting of a Pearson frequency curve to the unknown distribution 
Since 0 g X. g 1 it is natural to fit a frequency curve of the following 


type 



(7) 

y - Cx^-\1 

- xy-\ 

where 

c — ^ 

Tip + q) 

B{p, q) 

Tip)T{q) 


The first two moments are sufficient to determine the two parameters p and g. 
The moments of the distribution (7) are readily computed : 


( 8 ) 


In general : 


_ P 
P + g' 

_ p + 1 ^ p _ V + 1 

'"‘p + g + l p + g'p + g + l‘ 


p* P + g + A 

Equation (9) corresponds to equation (5) since one can write v = v{f, q), then 
(9) becomes 


(10) = Kp + h, q). 

nip, q) 

At first sight the similarity of equations (5) and (10) would suggest that one 
should choose p and q so that v{p -p h, q) = ^(iV + h), for h = 0, 1, 2, 3, • • • . 
However, this cannot be done because the equations which express the equality 
of the first two moments: 


(11) p(p, g) == m( 1 V) 

(12) v{p + 1, ?) = piN + 1) 


* The case w = 1 is omitted here since it has been treated by Neyman and Pearson [4]. 
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determine p and q completely. The quantities v(p + h, q) can be only approxi- 
mately equal to fi(N’ + ft,) for ft. > 1. The goodness of fit may be tested by 
comparing the third and fourth moments. 

The advantage of equations (5) and (10) is that once p and q have been 
computed one does not have to compare 

Vs = Vi v(p + 3, q) 

with 


Pi — ptp{N + 3), 


but since va = Ma one can compare p(p + 3, q) with p(N + 3). Similarly the 
comparison of the fourth moments can be replaced by comparing v{p -f- 4, q) 
with + 4). It has to be remembered that once the sequence of p(Nys has 
been computed for all N's, each of its terms can be used four times in the deter- 
mination and the checking of p’s and q’e. 

The general procedure for the determination of p and q is to compute the 
m(A^)'s first and then solve the equations (11) and (12): i.e. 


(13) 


P 

P+9 


= P(N), 


(14) 


p + 1 
P + Q + l 


= p(N + 1). 


The solution of these equations is: 


(16) 

(16) 


P = P(N) 


1 - a(fV + 1) 

piN + 1) - p(N) 


f 



1 


P- 


As N increases p(N) approaches 1 from below; p(N + 1) — p(N) will be very 
small. E.g., for n = 2 as iV varies from 30 to 50 p(N) increases from .9164 
to .9499. It is easily seen that small errors in p may produce much larger errors 
in p and q. For n = 4 it was necessary to compute p to nine decimal places 
to get p and q to three decimal places accurately. For n — 2 equations (13) and 
(14) become 


(17) 

(18) 


p _ (JV - 2)* 
p + q (N-lXN-h)’ 

P + 1 ^ iN - 1)^ 

p + g -f 1 N{N + i) ‘ 


These can be solved explicitly 

(19) p = (Af - 2) j^l - ^ 
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( 20 ) 


2.6 + 


4.6 


5m ~m+ I 


The last two equations enable us to compute p and g directly and thus avoid the 
more laborious computation by means of the /u(iV)’s. Torn = 3 and 4, however, 
such a short cut was not found. The computation of ii’s for n = 4 was facili- 
tated by the following relation: 


( 21 ) 


M = M 


(N - 4)* 


(AT - 2)(iV - f)’ 


where the suffices denote the number of variables in the problem to which the 
y.’s refer. Thus the computed values of niiN) were again used. Eight-place 
logarithms were used in the computation of 3 n(N) from the formula at the 
end of §3. 


5 The distribution of for two variables. For two variables it is possible 
to evaluate the distribution of or some other suitable power of X directly 
from the moments. Pearson and Wilks (Cf. [6] pp. 364-368) derived the 
distribution of for this case. Their method was adapted to the treatment of 
more general problems than ours. It is possible to derive the distribution of 
X^^^ in our special case more directly: 

For n = 2 the moments of are: 

I 

T(N - i)nN - 1) 

T{N + h- i)TiN + h-l)’ 


h = 1,2,3,..- . 

Applying the following transformation formula* 

(23) r(z)r {2 + i) = |§r(2z) 

to 

zi = UN + h- 2), 32 = §(1V' - 2), 2a = AT - 1 and 24 = Af -f- A - 1 


( 22 ) == 2 ** 




(22) becomes 

1241 a - o2i.r r(A7-f-fe-2) T r(2Af-2) 

^ L r(^ - 2) J r( 2 Ar + 2 A - 2) 


Thus pk is of the form F(N + h)/F(N) with 


F(iV) = 2*" 


r(JV - 2)* 
r(22V - 2) ■ 


Cf, Whittaker and Watson. Modern Analysis, 4th ed , p. 240. 
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Our problem is to find a function P{t) such that 

(25) l\''P(i)dt, 

Jo 

for A = 0, 1, 2, • ■ • . 

This problem is solved if we can find a function of N and t, say, p(JV, t) such 
that 

(26) F(N + A) = f t) dt 

Jo 


for A = 0, 1, 2, • • • . 


N and A enter the left side of equation (26) symmetrically. The same must be 
true for the right side. Hence piN, t) must have the form where f{t) is 
independent of N. If then (26) is satisfied for all N and A = 0, it is also satisfied 
for all N and all A. 

Let us now examine P{N). Applying again the transformation (23) we can 
bring it to the form: 


(27) 


F{N) 


_ r(A^ — 2)\/jr 

r(iv - ^)(iv ~ 2) 


M v(N - 2)r(i) 

{N - 2)T(N - 


'Now B{N — 2, 1) can be represented as an integral of the desired type 

(28) B(N - 2, 1) = r - t)^dt. 

*'0 

We set p{N, t) = and seek to determine g{t) so that (26) will be satis- 

fied for all N and for A = 0: i.e., 

(29) F{m = B{N - 2, 1) = 2* £ dt. 

An integration by parts with p(l) = 0 gives 

i&2 = -N^2 I * 


This equation evidently holds for all N if and only if 

(31) -ig'it) = vr^t. 

This differential equation is readily solved by the substitution of y 
In fact it becomes 


dgiy) 

dy 


= 2[y^ + y* + y^ + 

1 - r 




(. 


(32) 
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Hence 

(33) g = log -2y=^ 2j^Iog ^ - VTIT^]. 

The complete solution for the distribution function is 


W - Vnr?] di 

vvith t — in accordance with the formula of Pearson and Wilks.® Integration 
by parts gives 


P(X'"" < 0 = 


r(2iV - 2) 


(35) 


22«-6r(i\r _ i)r(Ar - 2) ' 

‘"‘■[lOE — - vnri. + i f' y’'-v~vdp 


One can use this last equation to determine the limits of significance. How- 
ever, this was not done when the tables of this paper were computed. The 
approximation of the distribution function by the function described in equation 
(7) was deemed sufficient and the use of the tables of the incomplete beta function 
greatly facilitated the computation. 

In concluding this section we wish to demonstrate the goodness of approxima- 
tion of the exact distribution function by a function of the type Cl’’“^(i — 
with p and q given by equations (19) and (20). 

For small values of t the shape of the curve is determined by the exponent of 
t, which is exactly iV — 3 for the distribution function and nearly fV - 3 — f 
for the approximating function. For large i; i.e., small I ~ t, the exponent of 
(1 — i) is the determining factor. By (32) we have 

(?(vr=i) = 2 -f- 

or approximately 1(1 — t)*. For the approximating curve g — 1 = | -f 0(1 /i\f*) 
which is even better agreement. It is easily seen that the goodness of approxi- 
mation increases with N. 


6. Determination of the Levels of Significance. The final task was to com- 
pute the values of x which satisfy the equations: 

with « = .01 and « = 05. This was done by interpolation in the Tables of the 
Incomplete Beta Function [8]. In these tables the argument x increases by steps 
of .01. A value oca was determined by inspection, so that /*„(?, ?) < « but 
^*i(p, 9) > a where xi = xo + .01. The values of h,{p, q) and L,{p, q) were 


‘ Of, [6] p. 388 Equation 60 (i* = t). 
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determined by interpolation with respect to p and q, using the two-dimensional 
Everett formula, neglecting fourth and higher differences, x was then deter- 
mined by linear interpolation. It is worth while to mention that the terms of 
second order in Everett’s formula decreased quite rapidly as N increased. Once 
this was noticed some labor has been saved by not computing the terms of 
second order for values of N between 30 and 50 but by estimating the second 
order terms from those obtained for N = 30, 40 and 50. 


Levels of Significance of 


Sample 

Size 

2 

Variables 

3 

Variables 

4 

Variables 

N 

1% 

5% 

1% 

6% 

1% 

s % 

10 

.395 

.507 

.238 

.328 

.122 

.184 

11 

.437 

.546 

.282 

.374 

.153 

.217 

12 

.475 

.579 

.323 

.414 

.198 

.270 

13 

.508 

.610 

.360 

.451 

.233 

.308 

14 

.637 

.634 

.393 

.483 

.267 

.343 

15 

.563 

.656 

.423 

.512 

.298 

.376 

16 

.586 

.676 

.451 

.537 

.328 

.405 

17 

.607 

.694 

.476 

.561 

.365 

.432 

18 

.626 

.710 

.600 

.582 

.380 

.466 

19 

.644 

.724 

.521 

.601 

.404 

.479 

20 

.660 

.737 

.641 

.619 

.426 

.501 

22 

.687 

.760 

.576 

.650 

.466 

.638 

24 

.711 

.779 

.606 

.676 

.601 

.571 

26 

.731 

.795 

.632 

.699 

.532 

.699 

28 

.749 

.809 

.655 

.719 

.560 

.624 

30 

.765 

.821 

.675 

.736 

.584 

.646 

32 

.778 

.832 

.694 

.752 

.606 

.666 

34 

.791 

842 

.710 

.765 

.626 

.683 

36 

.802 

.850 

.724 

.778 

.644 

.700 

38 

.811 

.868 

.738 

.789 

.660 

.714 

40 

,820 

.865 

.750 

.799 

.676 

.727 

42 

.828 

.871 

.761 

.808 

.689 

.739 

44 

.836 

.877 

.771 

.816 

.702 

.760 

46 

.843 

.882 

.780 

.824 

.713 

.760 

48 

.849 

.887 

.789 

.831 

.724 

.769 

,50 

.854 

.891 

.796 

.837 

.734 

.773 
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7. An Example. The problem chosen to illustrate the use of the tables is 
taken from a study on insulin-treated schizophrenic patients of the Worcester 
State Hospital. It was attempted to differentiate between those patients who 
recovered after treatment and those who did not recover. Blood constituents 
and blood pressure were determined among other variables. 

The variables in this example are designated as a: = blood phosphorus, y = 
cholesterol in mg./lOO cc., z = blood pressure in mm. Hg. The statistics for the 
10 "recovered” patients are: 


iS’ = 2.222 

Sl = 376.60 

Sl = 51.97 

riiS^Sy = -1.121 

nsS^S, = -8.217 

r^SyS, = 12.51 

For “not-recovered” 10 patients 


Si = 3,120 

/S; = 816.19 

Sl = 96.32 

TnSrSy = 26.23 

ruS^S. = 2.92 

rssSyS. = 65.78 

For the total group of 20 



8l = 3.034 

Sl = 609.02 

Sl = 83.09 

rnS^Sy = 10.41 

rnS^S, = -.845 

TziSyS. = 15.99 


These values give for the sample variances 17,462; 168,628; and 143,904, respec- 
tively. 

Hence 

■.mo _ Vl7,462 X 168,628 _ 

^ 143,904 

The 5% limit of significance is .328, hence the two groups do not differ signifi- 
cantly from each other. 
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NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


THE DISTRIBUTION OF “STUDENT'S” RATIO FOR SAMPLES OF 
TWO ITEMS DRAWN FROM NON-NORMAL UNIVERSES 

By Jack Laderman 


The fundamental assumption in the derivation of “Student’s” distribution^ 


fit) dt = 




is that the universe sampled is normal. When the universe sampled is non- 
normal and n is small, the distribution of t differs considerably from "student’s” 
distribution. In 1929, Rider** derived the distribution of t for samples of two 
drawn from the rectangular distribution 


fix) dx ~ 


Idx 

Q 


for 0 < a: < 1 
elsewhere. 


In this paper, the formal expression for the distribution of t will be derived 
for samples of two drawn from any population having a continuous frequency 
function. A geometrical method similar to that employed by Rider will be used 
Let the universe sampled have the frequency function, fix), with zero mean, 
and let fix) be greater than zero from a: = a to x — b and equal to zero else- 
where. Suppose the two observations are xi and Xj • 


Then 


Xi + Xj 

X = 


and 


1 = 


■s/nit 



2(x — 
n-1 


X 

V (xi - x)* + (xj - x)“ 


‘"student”, “The Probable Error of a Mean” Biometrika, Vol, VI (1908), pp, 1-25 
‘Paul K Rider, "On the Distribution of the Ratio of the Mean to Standard Devia- 
tion in Small Samples from Non-Normal Universes”, Biometrika, Vol, XXI (1929), pp. 
124-143. 
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The sample (osi , xj) can be represented by a point in a square of side b-a, 
as point P in Figure 1. 



Fio. 1 


The coordinates of F are {x, 

OF = — \/2 X 

FP == V (xi — x)* + (X2 — xY 


therefore 



cot 6. 


Similarly for a point lying below AB, the value of i is — cot B. Hence all 
points on 08 and its image OR have the same value of t. 


Let 

Then 


a — 


3ir 

T 


tan a = 


Xi 


t + 1 
t - 1 

« + 1 


and the equation of OS is 


Xl. 


The probability of getting a sample point in the element of area dx\ dx^ is 
f(xi)/(x<i) dxi dxj . Therefore the probability of getting a value of t less than 
the value represented by a point on OS is given by 


t+i 


( 1 ) 


ro * 

2 ( / /(xi)/(x2)dxi!dxi. 


By differentiating (1), we get the frequency function 

I?® - »>)*■• 
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However, this expression is valid only when < < cot vj where <p is the angle 
between OB and OC in Pigure 1. From Figure 1 we notice that 

0-1 + (-|) 

therefore cot <p — ? 

0—0 


Hence the 
When t 


above expression, ff(t), is valid for i ;< • 

I ^ 

> — - , the probability of obtaining a value of t greater than the 

0 — 0 



Fig. 2 


value represented by a point on OS or its image OR, as in Figure 2, is given by 


f{xi)fixt) dxidxi 


1+1 


and the distribution function is 


( 2 ) 


2 f f fixi)fixi) dxidXi. 

Jo Jt-i 


1+1 


After differentiating (2), we obtain the frequency function 

“ (FTw f (rn *’) *’■ 


Thus, the frequency function of t for samples of two can be obtained from the 
expresaons: 
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( 3 ) 


why I C-^5 *) * ‘ ^ ^ 

, (iTiy* I. (fri ") * ‘ ^ sh 


These expressions may also be used when a, or i, or both are infinite. However, 
the join point r— ^ is then indeterminate, but by consideration of Figure 1 , it 
can easily be seen that the join points are as follows: 


a = — 00 

h finite 

II 

1 

a finite 
b = +00 

t = 1 

a = — 00 

6 = "4* 00 

II 

0 


The expressions given by ( 3 ) have been verified by obtaining the distribution 
of t for samples of two drawn from the normal distribution and also from the 
rectangular distribution. The explicit distributions were found very easily from 
(3) by performing the integrations. 

1 

For instance when f(x) — — 7==^ e 

<r V 


we get 


git) = 


1 

t (1 + f) 


— w < t < -f- 


which agrees with Student’s distribution for n = 2 . 

elsewhere 


And when 


m ={J 


we get giO — 

l2(< + 1)* 

which agrees with the distribution found by Rider as corrected by Perlo, 


2(t - 1)* 
1 


for t < 0 
for t > 0 


New York 

’Victor Perlo, “On the Distribution of Student’s Ratio for Samples of Three Drawn 
from a Rectangular Distribution,” Biomeirika, Vol. XXV fl933) pp, 203-204, 
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ON SOME INFINITE SERIES INTRODUCED BY TSCHUPROW 

/ 

By J. B. D. Dekksen 


In his fundamental work on the principles of the theory of correlation Tschu- 
prow introduces some infinite series, leaving certain questions regarding their 
convergence or divergence unsolved.* 

As will be shown in the following note, these series are what may be termed 
randomly divergent,'’ that is series involving random variables which may take 
on values which will make the series divergent. This result is of importance: 
e.g. the well-known formula for the standard error of a correlation coefficient 

^ is the first term of an infinite series for which the question of con- 
vergence has not been carefully considered. 

Tschuprow finds himself confronted by infinite series, when dealing with the 
mathematical expectations of quotients as e.g. correlation coefficients or sums 
of quotients as e.g. the mean square contingency. Let us consider a two- 
dimensional discontinuous universe, where the variables are x and y. Let pi, 
be the probability of the occurrence of the pair of values Xt,y,. The proba- 
bility that X assumes the value a, equals ^ == p,| (f = 1, • • • fc; j = 1, • • • 1). 

}-i 

When taking a sample of N pairs of observations (a;, y) the relative frequency 
of will be p[\ , and that of the pair (x, , y,) will be p[j . In accordance with 
Tschuprow we put p[, = dpt, and p[\ - pj| dp,| , where dp.,' and dp,| 
are random variables. 

As one of the simplest cases we consider the mathematical expectation of 

V.|/ P.l \ P.;/ \ P.|/ 



Now Tschuprow develops the last factor in an infinite binomial series, getting 


(1) 





dp, I -dp.) 
P.l-P.7 




He has given general formulae {Biometrika, vol. XII, p. 194 (1919)) from which 
the mathematical expectations of the terms of this infinite series may imme- 
diately be found. We get an infinite series containing ascending powers of N 


' A. A. Tschuprow, Grundbegnge und Orundprobleme der Korrelationstheorie, Leipzig- 
Berlin, 1925, p. 85-97. An English, translation was prepared by M. Kantorowitsch {Prin- 
ciples of the Theory of Correlation, W Hodge & Co., 1939). 

* Cf my Inleiding tot de correlaiierehening, Delft, 1935, p, 88-90. 
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in the denominator. Finally the convergence or divergence of this series has to 
be investigated, the problem left unsolved by Tschuprow. 

The series expansion of fl -f- j however, diverges for values dpi\ such 
\ P'i / 

that > 1 and converges only if 


This result is not affected by the procedure of the determination of mathe- 
matical expectations. For if /(p,'| , p[,) is the probability distribution of {p'^ , 
Pi,), then we have to multiply (1) by this function and to sum for all possible 
values of Pii , p, , . As the expressions 


Pi\ \ 


are always positive, the infinite series, which results from replacing the terms of 
(1) by their mathematical expectations, will also be divergent. 

The same argument is true, when we consider for instance the mathematical 
expectation of the Pearson-Bravais correlation coefficient. Denoting by mu , 
Mao , Moj the population values of the product moment and the second order 
moment of x and y, and by nu , m , m the values observed in a sample, the 
mathematical expectation of the correlation coefficient may be found from 


- VM20 MosJ J L 


1^11 +• dpi! 

.{p70 + dli2o)n«>2 + 


(mjoMos)* 


where dpu , , and dm are random variables. Tschuprow expands the de- 

nominator in binomial series. However if dm and dm take on values such that 

> 1 or > 1, these series will again be divergent. Analogous diffi- 

Pili P02 

culties arise in all other cases, where Tschuprow makes use of binomial ex- 
pansions. 

It should also be remarked that the well-known formulae, given by the Bio- 
metric School for the standard errors of regression and correlation coefficients 
are equal to the mathematical expectation of the first terms of infinite series, 
which, as explained above, are divergent for certain values of the random 
variables. Therefore the question arises as to what effect the divergence for 
some of the values of the random variables has on those formulae. 
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This question can be cleared up by the introduction of Slutsky’s conditionally 
aleatory variables.^ These are defined as follows. Suppose that an aleatory 
variable z can assume the values Zi,Zi, ■ ■ ■ z„, with probabilities Pi,Pi, ■ ■ ■ Pn. 
Now we put some of these probabilities equal to 0 , dividing the remaining ones 
by 1 — Q, if Q represents the total of all the reduced probabilities. The variable 
z then becomes the so-called conditionally aleatory variable z'. Moreover we 
assume that z converges stochastically to some limit. Then Slutsky has shown, 
that if Q converges to 0, z' will converge to the same stochastical limit as z. 
Moreover the ratio of corresponding moments of the distributions of z and z' 
will tend to unity. 

Now let us consider for example 



Omitting the values for which | dp,| | > p,| , we get a conditionally aleatory 
variable z' instead of z. However, according to the theorem mentioned before 
z' and z will converge to the same stochastical limit, since the probability that 
I dpj| 1 > Pi| converges stochastically to zero as the number of observations 
increases indefinitely. 

In the same way we consider 

^ _ _Mu_ _ Mil + dpii 

(p'iopoi)^ (mm + dp2i))'(po2 + dpos)* 

and omit those values for which [ ] > /220 and | dyoz ] > yo? ■ 

If now we consider the binomial expansions for the conditionally aleatory 
variables and determine the mathematical expectations of the terms, these new 
series will converge. All terms in these convergent series will be smaller than 
the corresponding terms in Tschuprow's series, because we have omitted the 
larger values of the dp’s and the dp’s. However if the number of observations 
increases indefinitely the ratios between corresponding terms tend to unity, 
because the probabilities, that e.g. | dpto | or | dpm | > pm converge to zero. 

Let us now turn again to the infinite series given by Tschuprow (loc. cit. p. 90) 
for the square of the standard error of a correlation coejBficient. 

(3) = Sir - E{r)y = ^ + A 4 . 4 . ... 

Here k, k, U • ■ ■ represent rather lengthy expressions, for which we may 
refer to Tschuprow’s book (loc. cit. p. 88-90). As we have seen before, this 
series is randomly divergent. However, by introducing a conditionally aleatory 
variable in the way described above, expanding it into an infinite series and 


’E. Slutsky, “Uber stochastische Asymptoten und Grenzwerte,” Melron 1925, Vol. V. 
No. 3, p. 79. Also my Inlerding tot de correlalierekemng, Ch V and VI 
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determining the mathematical expectations of its terms, we get a convergent 
series, say: 


( 4 ) 


a-'* — 4- -L a. 


From Slutsky’s theorem, mentioned before, it follows that if N increases the 
ratio of al and <rii will tend to unity. Moreover, if we take N sufficiently large, 
it will always be possible to fulfill the following inequalities: 

I >1-** (A: *1,2, ...n) . 

where ea (k = 1, 2, ••• n) and n are arbitrary. Therefore, when n and N are 
sufficiently large the ratio between the first n terms of the infinite series (3) 
and the true value of ff* will differ from 1 by an arbitrary small number. Though 
the series (3) is divergent for any N, however large, the first n terms of this 
series will give an approximation of by taking N sufficiently large. 

In this paper we have shown that the procedures which have been followed 
by the Biometric School and Tschuprow to establish formulas for the standard 
errors of correlation and regression coefficients and in analogous problems can 
be made rigorous by the use of conditionally aleatory variables. It was found 
that their infinite expansions are divergent for some of the values of the random 
variables involved, however large the number of observations {N) may be. Yet 
it could be demonstrated, that the first n terms of these series will give an ap- 
proximation, as close as is wanted, if N is sufficiently large. For practical pur- 
poses the case n = 1 is the most important. 


Nethbblands Central Bureau or Statistics, 
The Hague 


A NOTE ON FIDUCIAL INFERENCE 
By R. a. Fishb® 

In a recent paper [1] Bartlett has written a further justification of his criticism 
of the test of significance for the difference between means of two samples from 
normal populations not supposedly of equal or related variance. This test was 
originally put forward by W. V. Behrens [2], and later [3] found to be very 
simply derivable by the method of fiducial probability. 

It is unfortunate that Bartlett did not restate his own views on this topic 
without making misleading allusions to mine. Thus, on p. 135 in [1]: 

"It is sufficient to note that the distribution certainly provides us with an exact inference of 
fiducial type, as Fisher himself confirmed [9), p. 375." 

I do now know, and Bartlett does not specify, what unguarded statement of 
mine could be used to justify this assertion. From the time I first introduced 
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the word, I have used the terra fiducial prohahility rather strictly, in accordance 
with the basic ideas of the theory of estimation. Several other writers have 
preferred to use it in a wider application, without the restrictions which I think 
are appropriate. To all, I imagine, it implies at least a valid test of significance 
expressible in terms of an unknown parameter, and capable of distinguishing, 
therefore, those values for which the test is significant, from those for which 
it is not. 

Shortly after Bartlett’s alternative approach to the problem was put forward 
[4], I expressed [5] the following opinion. As this occurs prominently in the 
summary, indeed on the very page to which Bartlett refers in his quotation 
above, I cannot suppose he has overlooked it, though evidently he must have 
missed its meaning. I wrote as follows [5] p. 375: 

“The criticism of Behrens’ test of significance, recently put forward by Bartlett, on the 
ground that it differs from a possible alternative test, overlooks the inconsistency of assum- 
ing for the unknown variances both (a) fiducial distributions in accordance with the samples 
observed, and (b) values fixed from sample to sample. 

The alternative test of significance proposed involves, when the variance ratio of the two 
populations sampled is unknown, the choice by lot between the value T, used in Behrens’ 
test, and a second value T', which reverses the order of significance of different possible 
sets of observations. High values of T' are not, therefore, by themselves evidence of 
inequality of the means " 

I submit that the second paragraph quoted above shows, without further 
argument, that I rejected Bartlett’s proposed test of significance, and therefore 
that I did not confirm his opinion that it provided “an exact inference of fiducial 
type.” Whether my reasons for doing so were strong or weak is, of course, 
another matter. 

What may have led Bartlett to adopt his test of significance is its formal 
similarity to one appropriate to a different problem. In 1908 “Student” in his 
now celebrated paper on “The probable error of a mean” [6] applied his solution 
to what are known as paired observations. Two treatments A and B are 
applied each to one of a number of pairs of plots, or other experimental units, 
the members of each pair being chosen to be in other respects closely comparable, 
although the circumstances of the different pairs are not necessarily closely alike. 
In order to allow for any, possibly large, variations in the conditions prevailing 
in the different pairs, attention is confined to the difference, having regard to 
sign, supplied by each pair. 

Thus, if pairs of measurements ai , bi , uj , bj , • • • are obtained, we may write 

djk = oi — h, 

and test the hypothesis that the differences d are a normal sample having zero 
mean. This hypothesis will be true if at and b* are distributed, by experimental 
error, in normal distributions having the same mean, even though this mean is 
not the same for different pairs. It will be true if the variances of a and b from the 
hypothetical mean of the pair are unequal, provided these variances are the same 
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from pair to pair. These are the reasons which make the hypothesis that d is 
normally distributed about zero appropriate for testing the differential effect of 
the treatments. 

If only two pairs are used, “Student’s” test reduces to 


d\ + dj 
di — dt 


There is one degree of freedom, so that t is distributed in Cauchy’s distribution 




dt 


» 1 + 


If, now, the symbols have a different meaning, so that Oi and Oj are a sample of 
two from a single normal distribution, and hi and bj a second sample from a 
different population, having by hypothesis an independent variance, Behrens’ 
problem (limited for comparison with Bartlett to samples of 2) is to test whether 
the two populations can be regarded as having the same mean, or whether there 
is reason to regard the means also as being different. Note that the pairs 1 and 2 
are not supposed to differ in treatment or situation. The difference — oj 
is not to be ascribed partly to differences between the hypothetical means of 
these pairs, but wholly to the error variance of the observations o, about which 
it is the only source of information; the like is true of the difference hi — bg . 
The sign of these two differences is arbitrary, only their positive values concern 
our problem. There is no real correspondence between the suffices assigned to 
the two pairs of letters. They could be interchanged for a, and not for b, without 
affecting the problem. 

Behrens’ test reduces for this case to taking 


T = 


fli "h ffls — bi — bg 
di — flg I + I bi — bj 1’ 


using for the probability function, “Student’s” distribution for one degree of 
freedom. Bartlett’s test involves choosing at random between T and T', where 


T' = ~b Oa ~ bi — bg 

||ai - ail - |bi - bg|r 

It will be noticed that, if [ bi — bg | < | Oi — og j , and if, keeping bi +- bg 
constant, | bi — bg j is increased, a change which must make us suspect larger 
errors, and therefore a lower significance, the value of the difference 1 Oi — Og j — 

I bi - bg I is continuously diminished, and that of T' continuously increased, 
without limit. In fact, the probability of exceeding any limit of significance, 
however high, may be made to exceed 50% by this process. The order of 
significance of such a series of possible observations is thus reversed. The 
fact that choosing at random T and T' will give us a quantity which, on the null 
hypothesis, is distributed in “Student’s” distribution is, thus, insufficient to 
justify its use as a test of significance. 
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It is also irrelevant, and this may be at the present time the most important 
point to make, that the sampling distribution of T above is not given by “Stu- 
dent's” distribution, if the populations to which statements of probability refer 
is supposed to consist of samples taken repeatedly from populations having a 
fixed variance ratio. Such a supposition, as I noted in the passage quoted above, 
is inconsistent with the fiducial distributions derived from the samples. Bartlett 
comes near to discussing this point on p. 136 in [1]. He says: 

“While Fisher suggests that this in no way invalidates his fiducial argument, in my view 
if an inference is to be independent of an unknown parameter, it should in particular be 
independent of it if we imagine that we are being supplied with pairs of samples, for all of 
which the ratio has the same value." 

In its natural meaning this statement seems to be true. The problem concerns 
what inferences are legitimate from a unique pair of samples, which supply the 
data, in the light of the suppositions we entertain about their origin; the legiti- 
macy of such inferences cannot be affected by any supposition as to the origin 
of other samples which do not appear in the data. Such a population of samples 
is really extraneous to the discussion. Nor has Bartlett shown that Behrens' 
inference from a unique pair of samples is so affected. What he seems to rely on 
is that an aggregate of samples fulfilling the null hypothesis, but drawn from 
pairs of populations having a fixed variance ratio, will show differences between 
their means exceeding the limits fixed by the test for significance, with a fre- 
quency other than that indicated by the test. This, however, is a circumstance 
common to all the well known tests of significance, and has been obvious from 
their very origin. 

In “Student’s” test for significance, for example, if a sample of n' observations 
are taken from a population normally distributed about zero, we calculate 

X = —,S{x), n = n' — 1, 8* = -/S(a; — x)^ 

n n 

and count x as significant, if 

X > stn/y/n' 

where is “Student’s” test for n degrees of freedom, corresponding to the 
level of significance chosen. 

However, in repeated samples of n' from a population having a given variance 
it is highly improbable that £ will exceed the limit assigned with the frequency 
chosen. The limit it will exceed with this frequency is 

which will usually differ from that assigned from the sample. This, however, 
has not hitherto been considered an adequate reason for calling the test inac- 
curate, or biased. It is merely a recognition of the fact that, if we did know a, 
we could make a better test. Just as, in Behrens’ problem, if we knew the 
relative weights x of the observations in the two samples, we could make a 
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weighted Student s ’ teat, and should be wise to do so — if the information were 
available. 

Naturally, it may be said that although the limit of significance assigned to x 
will not be verified in repeated sampling from populations having the same 
variance, the distribution of t will be so verified. In this respect the distribution 
of i in "Student’s” case is analogous to the simultaneous distribution of <i and <j 
in Behrens’ case, where 



n is the hypothetical common mean of the two populations, and sj, and «j ^e 
the estimated variances of the means of the two samples. The quantity d 
which Sukhatm6 [7] has conveniently tabulated, in such a way that 

dy/ 8* + sj 


supplies a significance limit for Xi — Si , naturally does not possess the property 
that 


£i — Xt> dy/ 8? + sj 


with the probability assigned, in a population consisting of pairs of samples from 
populations having the same variance ratio. 

If the populations were fixed, the corresponding limit would be 

t»y^ IT* + <rj , 


and if the variance ratio were fixed so that w is the weight of it relative to that of 
2i , it would be 



provided always, if we wish to express ourselves in terms of repeated sampling, 
that the absolute values of <ri , or <ri were fiducially distributed. Behrens' 
problem refers to the case in which neither the variances nor their ratio is 
known, so that the unknown variances, independently, must be given their 
fiducial distributions. 

In this note I have not touched on the logical background of Behrens’ test, 
or the practical conditions on which it is appropriate, since I have recently 
discussed these more fully [8J. Recently also [9] Yates has given a careful 
explanation of the basis of the test. 


Summary 

The statement of Bartlett that the author (Fisher) has confirmed that Bart- 
lett’s approach to Behrens' problem provides an exact inference of fiducial type 
is incorrect. The only exact test appropriate to his problem seems to be that 
given by Behrens. 
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A NOTE ON NEYMAN'S THEORY OF STATISTICAL ESTIMATION' 

By Solomon Kullback 

In this note we shall examine a section of a recent paper by Neyman' dealing 
with statistical estimation. Consider the following quotation fromthatsection* 
which deals with the statement of the problem: 

“Consider the vanables [xi , 0 : 2 , ■ • , *»] and assume that the form of the 
probability law [p{xi , • • • , Xn 1 , ^ 2 , • • • , 0i)] is known, that it involves the 

parameters di , dt , • • • , 0i which are constant (not random variables), and that 
the numerical values of these parameters are unknown. It is desired to estimate 
one of these parameters, say 0i . By this I shall mean that it is desired to define 
two functions d(E) and 6(E) < 6(E), determined and single valued at any point 
E of the sample space, such that if E' is the sample point determined by observa- 
tion, we can (1) calculate the corresponding values of 6(E') and d(E') and (2), 
state that the true value of 61 , say 6 ° , is contained within the limits 

6(E') <el< 8(E') (18) 

this statement having some intelligible justification on the ground of the theory 
of probability. 


‘ Specifically we refer to J. Neyman "Outline of a Theory of Statistical Estimation Based 
on the Classical Theory of Probability,” Phil Trans. Roy. Soc , vol. A236 (1937), pp 
333-380. 

• J. Neyman, loc. cit., p. 347. The material in brackets are slight alterations of the 
original text in order that the quotation do not refer to previous matter in the original 
paper. 
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This point requires to be made more precise. Following the routine of thought 
established under the influence of the Bayes Theorem, we could ask that, given 
the sample point E', the probability of ffi Jailing within the limits (18) should be 
large, say a = 0.99, etc. If we expressed this condition by the formula 

P{e{E') < el <e(E')\E'] ^ <x (i9) 

we see at once that it contradicts the assumption that $1 is constant. In fact, 
on this assumption, whatever the fixed point E’ and the values 6 (E') and HE'), 
the only values the probability (19) may possess are zero and unity. For this 
reason we shall drop the specification of the problem as given by the condition 
(19).” 

We believe that the following approach to the problem, emphasizes to a greater 
extent the fact that if the practical statistician follows the steps recommended 
as a result of Neyman’s solution, then ‘in the long run he will be correct in about 
100a percent of all cases’. 

Let us return again to the condition (19) of the quotation, and write 

(1) riE) = P{e(E) <el< eiE) \ e\ 

where of course ir(jE') — zero or unity according as the true value of Si , say 9? 
does not or does satisfy the inequality 

(2) HP) < e? < HE) 

We may however calculate the average value of viJE) i.e., the percentage of cases 
in which in the long run the statistician will be correct.’ In accordance with 
the definition of an average 

(3) i^(,E) " [ K'{E)p(,E jSi, 02, • • • , 9i) dxidx2 • • • dzn 

•'R 

where the region R is the entire sample space. If we let Ri be that portion of the 
sample space for which (2) is satisfied, then since t{E) = 1 if F falls in Ri and 
zero otherwise 

(4) = / p(E - >’, 81 ) dxi dxi - dx„ 

•’ri 

Thus, if we want our rule to lead to a correct statement in 100a percent of cases 
in the long run, we must look for two functions d(E) and 8 (E) such that for the 
corresponding region Ri 

(5) ^dE) ~ I p(El 0 l,e 2 , , 8 i)dxidx 2 ■■■ dx., = a 

holds good whatever the value 8l of 8^ and whatever the values of the other 
parameters 81, 83, ,81 involved in the probability law of the X’s may be. 

’ Cf. A. Werteimer, "A Note on Confidence Intervals and Inverse Probability,” Ann.aU 
Math. Stalishca, Vol. X (1939), pp. 74ff. 
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If we apply to the preceding the calculus of probability in accordance with 
Neyman/ we find that (5) may be written as 

(6) = P{e(E) < el< eiE) !<)?} = « 

which, with the conditions stated for (5) is identical with formula (20) on page 
348 of Neyman’s paper. 

The Gbosqe Washington University 
Washington, D. C. 


* J. Neyman loc. cit. pp. 333-343. 


A NOTE OH A PRIORI INFORMATION 


By C. Eisenhart 


A survey of recent literature on mathematical statistics is sufficient to reveal 
the fact that in approaching certain types of problems some writers assume 
more information known a priori than do other writers. Indeed, it soon becomes 
evident that great care is necessary in wording (and in reading) propositions in 
mathematical statistics. Furthermore, propositions which are true and power- 
ful when certain information is known a priori may become either useless or 
irrelevant according as more, or less, information is available a priori. Once 
this situation is appreciated some apparent contradictions are resolved, and 
certain exceptional examples can “be reasonably regarded as bearing out the 
principle to which formally they are anomalous.” 

So far as I know it was Bartlett [1, p. 271] who first clearly pointed out how a 
slight change in the amount of information known a priori can greatly alter 
the complexion of a problem, He was indebted to Neyman and Pearson 
[5, p. 122] for his problem, which was to develop a test of the statistical hypothe- 
sis, Ho , that |3 = |3o and y = 70 for a random sample from the distribution 


( 1 ) 


pix) = 


r) 


for X >y 
for a: < 7 . 


If (1) expresses all the information (about the distribution of x) that is to be 
considered as known a priori, any value of /3 > 0 and any finite value of 7 
being admissible, then it follows immediately from a result of R. A. Fisher’s 
[2, p. 295] that no uniformly most powerful test, in the sense of Neyman and 
Pearson [4; 5, p. 115], can exist for Ho , since Ho involves the simultaneous 
testing of two unrelated parameters.* 


‘ Since Fisher’s wording is important it will be well to quote him here: “It is evident, 
at once, that such a system [of maximum likelihood relations needed to insure the existence 
of a uniformly most powerful test] is only possible when the class of hypotheses considered 
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By assuming that in addition to (1) the a 'priori information includes the 
knowledge that ^ ^ and 7 ;< >0 constitute the only admissible ranges of 
values of these parameters, Neyman and Pearson [5, p. 122] have succeeded in 
showing that a uniformly most powerful test of Ho does exist when the ad- 
missible values of $ and 7 are restricted in this way. At first this appears to be 
in contradiction to Fisher’s statement referred to above, but Bartlett [1, p. 271] 
points out that the restrictions on the admissible values of /S and 7 reduce the 
problem effectively to one of testing a single parameter: In the first place, no 
statistical test is necessary if an observation less than 70 occurs, since this 
refutes the hypothesis Ho immediately. Therefore, a statistical test of Ho is 
needed only when none of the observations are less than 70 , and/or such observa- 
tions the distribution law is 

( 2 ) pix) - a: > 70 , 

and is independent of 7 . In consequence, the test reduces to testing the single 
parameter /9 in ( 2 ), for which the arithmetic mean, x, is a sufficient statistic. 
The discovery of a uniformly most powerful test of Ho , when the above restric- 
tions are placed on the admissible values of /3 and 7 , is, therefore, reasonably 
consistent with the full meaning of Fisher's statement. 

The preceding example makes quite clear how a little additional a priori 
knowledge can affect the solution of a problem in mathematical statistics. 
The a priori knowledge employed by writers in mathematical statistics usually 
falls into one of the following categories; 

(i) The elementary probability law is taken to be continuous or discrete, 
as the case may be, but its mathematical form is left unspecified. 

(ii) The elementary probability law is taken to be of a definite mathemati- 
cal form involving one or more parameters the value(8) of which is (are) not 
considered as known a priori, and any value (s) of this (these) parameter (s) 
consistent with the non-negative character of a probability law is (are) 
admissible. 

(iii) Here the information assumed known is as in (ii) except that the 
admissible values of the parameter (s) form (a) restricted sub-set (or sub-sets) 
of the values admissible in (ii), such subsets, however, being comprised of 
more than a single value. 

(iv) The information is so complete that the admissible values of the 
parameter (s) have (a) known a priori probability distribution ( 3 )— if a param- 


involvee only a single parameter 0, or, what comes to the same thing, when all the param- 
eters entering into the specification of the population are definite functions of one of 
their number. In this case, the regions defined by the uniformly most powerful test of 
significance are those defined by the estimate of the maximum likelihood, T. For the test 
to be uniformly most powerful, moreover, these regions must be independent of 8, showing 
that the statistic must be of the special type distinguished as sufficient.” (Words in 
square brackets are mine. — C. E.) 
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eter 0 is known to have a definite value d', then the a -priori probability law 
of B can be taken as (Prob. B equals B') = 1, (Prob. B not equal to B') = 0.* 

As statistical theory advances it may become necessary to classify problems 
according to the amount of information which may be assumed known a priori, 
when proceeding to their solution. No claim is made here that the above 
categories are the best to choose, but it may prove fruitful to study the extent to 
which results obtained with a certain amount of information assumed known are 
useful when more, less, or perhaps different, information is taken as known 
a prion In particular, as the preceding example shows, it may be well to 
investigate exactly what are the implications of restricting the ranges of the 
admissible values of parameters. 

It is unwise to attempt to predict the outcome of such research at this time, 
but it is probably safe to say that an increase in a priori information will gen- 
erally render possible better tests of significance — better in the sense that, for a 
given probability of rejecting the hypothesis tested when true, the probability of 
rejecting it when false will be greater — and narrower confidence intervals for a 
given confidence level. The example already given, concerned with a test of 
significance, supports this conjecture. As a further example, from the point of 
view of estimation, we may recall that it is possible with a level of confidence 
equal to .96875 to assert [3, p. 4] that the true median of the population from 
which a random sample of 6 was drawn lies within the observed range of the 
sample, and this without any assumption about the population except that it is 
continuous. If, however, the population is known to be of normal form with 
unknown mean, m, and standard deviation, w, then Student’s t will provide the 
narrowest confidence intervals for the median of the population, since t provides 
[6, p 378] the best available confidence intervals for the mean, ?n, (which is also 
the median) of a normal population when a is unknown— if the population is 
normal and a is known, then the normal deviate {x — m)-\/Q/ir will supply still 
narrower confidence limits for m. 

In conclusion, the circumstances under which it may be desired to apply 
methods of statistical inference may differ considerably in the amount of knowl- 
edge available to the research worker a pfion, and the most efficient tests of 
significance and methods of estimation applicable to a given case will depend 
upon the nature of the available information as described in the above classifica- 
tion. In comparing the procedures of different writers, therfore, it is most 
important to examine their premises and see how much information each is 
considering as known at the start. 
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A NOTE ON COMPUTATION FOR ANALYSIS OF VARIANCE 
By Morris C. Bishop 

The method of computation for analysis of variance commonly favored is one 
which involves obtaining the total and total sum of squares in a single operation 
on a computing or card-punch machine, ‘ in which case a check on the accuracy 
of the work requires complete recomputation. But the best tools available 
to the student, and sometimes to the experimenter, are a table of squares and 
perhaps a listing machine. In such a situation, a simple algorithm which 
embodies checks on the computations is urgently needed. The method here 
presented reduces the arithmetic to repeated application of a single procedure, 
with adequate checks; it reveals rather than obscures the sample variances, 
which may or may not be of primary importance; and it provides an intuitively 
logical portrayal of the step-by-step improvement of the estimate of population 
variance. 

The data items and their squares may be merged into a single table by setting 
them down in staggered fashion, as shown in Table I. If only a single criterion 
of classification is to be used — classified into columns, say — the columns are 
summed down, and then these totals across (obtained as two sets of subtotals 
and totals on a listing machine). This yields the grand total (T) and total 

( if k \ 

22 23 ) • Summing across and down verifies the addition 

»“1 /"I / 

and provides material for two-way classification. The total sum of squares of 
deviations is obtained by the familiar formula 

K k K k m2 

( 1 ) E'L ixii - xr = Ei: K - 

,-l ,-i t-l j-l N K 

where Nk is the total number of observations in N rows and k columns. 


1 See George W. Snedecor, Analysis of Variance and Covariance, and Paul R. Rider, 
Modern Statistical Methods. 



